Hacker News new | past | comments | ask | show | jobs | submit login
The tragedy of 100% code coverage (2016) (ig.com)
534 points by tdurden on May 9, 2017 | hide | past | favorite | 340 comments

I've had to work on mission critical projects with 100% code coverage (or people striving for it). The real tragedy isn't mentioned though - even if you do all the work, and cover every line in a test, unless you cover 100% of your underlying dependencies, and cover all your inputs, you're still not covering all the cases.

Just because you ran a function or ran a line doesn't mean it will work for the range of inputs you are allowing. If your function that you are running coverage on calls into the OS or a dependency, you also have to be ready for whatever that might return.

Therefore you can't tell if your code is right just by having run it. Worse, you might be lulled into a false sense of security by saying it works because that line is "covered by testing".

The real answer is to be smart, pick the right kind of testing at the right level to get the most bang for your buck. Unit test your complex logic. Stress test your locking, threading, perf, and io. Integration test your services.

This was one of the fun things about working on Midori, the whole system was available to us, components all communicated through well defined interfaces, you really could (if necessary) mock the interfaces above and below and be sure that your logic was being exercised. The browser team, in particular, pushed hard to be close to 100% coverage. The overall number for the project was (IIRC) in the 85% range.

When I'm writing tests, I'm not so concerned about what the tool tells me about which lines are covered, I like to work through the mental exercise of knowing which basis paths are covered. If a function has too many for me to reason about, that is a problem in itself.

As an aside, while I'm rambling, all the examples in the article appeared to represent unnecessary abstraction, which is the opposite problem. If you have many methods in a class with only one basis path, what purpose does the class serve. These testing concerns may be the code smell that points to deeper problems

> As an aside, while I'm rambling, all the examples in the article appeared to represent unnecessary abstraction, which is the opposite problem.

One other thing about code coverage, unit testing, and other testing fads is I think they actively affect the architecture, and usually in the overthinking it way.

Instead of having one tight bit of procedural code (which may have some state, or some dependency calls), people split it up into multiple classes, and then test each bit. This allows them to use the class architecture to mock things, but really has just multiplied out the amount of code. And in the end, you're running tests over the golden path probably even less. It's even possible to have 100% of the code covered, and not run the golden path, because you're always mocking out at least one bit.

I think there's an argument to be made that if the desire for testing is a main driver for your architecture then your tests are too granular, and you aren't testing any internal integration points. In my experience that means that your tests are so tied to your current architecture that you can't even refactor without having to rewrite tests.

mocking / interaction / expect breaks encapsulation to perform "testing".

Thus it is often a test of the implementation's assumptions when first written, and even worse, when the code is maintained/edited, the test is merely changed to get it to pass, because unit tests with mocks are usually:

1) fragile to implementation 2) opaque as to intent

Whereas input/output integration points are more reliable, transparent, and less fragile to implementation changes if the interface is maintained.

However, if you must do mock-level interaction testing, Spock has made it almost palatable in Javaland.

This is one area where functional fans get to make the imperative folks eat their lunch.

Exactly. I've seen tests where some code calls:

  printf("hello world");
And the test is:

  mock_printf(string) {
    if string != "hello world" then fail;
Which is basically just duplicating your code as tests.

I think that depends on the language/environment... for example, it's usually MUCH easier to get high coverage, and to do just enough mocking testing against Node-style modules in JS than logic in a typical C# or Java project.

Usually because interfaces need to be clearly/well defined and overridden... In JS there are tools (rewire, proxyquire, etc) that can be used with tests in order to easily inject/replace dependencies without having to write code to support DI nearly as much. In fact, I'd say that there's usually no reason not to be very close to 100% coverage in JS projects.

It's strange to me that we continue to make languages that force us to make certain architectural choices to help facilitate testing.

We have it since Eiffel and Common Lisp, but not all mainstream languages were keen on adopting design by contract.

On .NET it is a plain library, which requires the VS Ultimate editions to be useful and on Java it was mostly third party libraries.

C# design team is considering adding proper support as language feature, but it seems to be a very far away feature still.

C++20 might get contracts, but C++17 just got ratified, so who knows.

D, Ada 2012 and SPARK do already support contracts.

Indeed. It's almost as if testing should be built into the language itself. (I'm always a big fan of including self-test code in projects)

And it is built in in D:


It's been very effective at improving the overall quality of the code. And because of CTFE (Compile Time Function Execution) and static assert, many code correctness tests can even be run at compile time.

What are the languages that have testing not as afterthought?

This depends on how you think of things. Some languages (Eiffel?) have pre- and post-conditions, which do some of this job. Much of the boilerplate testing done for dynamic languages is done by the compiler for static languages. The borrow checker in Rust is doing the work that a test suite and/or static analysis tool would be doing for C/C++

Definitely Ruby in general, testing is a core of the language culture.

That doesn't mean testing isn't an afterthought in Ruby, the language. It means the culture built up a defense against writing buggy code.

I think that's the nature of a dynamic/scripted language... it's just easier to handle in Ruby and JS than many other languages.

> testing should be built into the language itself

I wonder what that would look like...

Rust has simple unit testing built into the language. And in Ada/Spark, tests can be a first class verification tool, alongside formal verification.

We should go a lot further though. IMO, a unit that does not pass a spec/test should cause a compile time error. Testing systems should facilitate and converge with formal verification. Where possible, property based testing should be used and encouraged. And debugger tools should be able to hone in on areas where the result diverges from expectations.

> IMO, a unit that does not pass a spec/test should cause a compile time error.

We can achieve this in dependently-typed languages like Idris. First we define a datatype 'Equal x y', which will represent the proposition that expression 'x' is equal to expression 'y':

    data Equal x y where
      Refl : (x : _) -> Equal x x
There are two things to note:

- There is only one way to construct a value of type `Equal x y`, which is to use the constructor we've called `Refl`.

- `Refl` only takes one argument, `x` (of arbitrary type `_`), and it only constructs values of type `Equal x x`. This is called "reflexivity", thus the name `Refl`.

Hence if we use the type `Expr x y` anywhere in our program, there is only one possible value we can provide that might typecheck (`Refl x`), and that will only typecheck if `Expr x x` unifies with `Expr x y`, which will only happen if `x` and `y` are the same thing; i.e. if they are equal. Thus the name `Equal`.

This `Equal` type comes in the standard library of all dependently typed languages, and is widely used. To use it for testing we just need to write a test, e.g. `myTest`, which returns some value indicating pass/fail, e.g. a `Boolean`. Then we can add the following to our program:

    myTestPasses : Equal myTest True
    myTestPasses = Refl myTest
This will only type-check if `Equal myTest myTest` (the type of `Refl myTest`) unifies with `Equal myTest True`, which will only be the case if `myTest` evaluates to `True`.

Force.com doesn't allow you to deploy code to production unless at least 75% of the statements are executed[0]. I wish my employer had a similar requirement.

[0] https://developer.salesforce.com/docs/atlas.en-us.apexcode.m...

^ This.

And another example of something I'd want checked by the language is exception throwing / handling. It's another one of those places where code coverage won't help you unless you already know what you're looking for. Languages are getting better about it, but in general, handling errors is hard.

Pure languages like haskell might be the closest thing at the moment.

In haskell you embed domain specific languages when you require side effects:

    class MonadState s m where
        get :: m s
        put :: s -> m ()
This is like an interface specifying the basics of all stateful computations.

You can use different implementations for production and testing without changing any code so mocking is built into everything.

Didn't Smalltalk do something like that? I thought it had TDD built into its standard IDE.

Dynamic languages can kinda cheat here. Python has mocking built in to the core lib.

> If you have many methods in a class with only one basis path, what purpose does the class serve.

It ties together related algorithms. You want "addDays(n)", "addHours(n)", "addMinutes(n)" and so on to be in the same class, even if they're one-liners.

Clearly there are good examples either way. I'm more interested in the idea that this strong mismatch between the complexity of the code and the complexity of its tests is, in itself, a code smell that may point to an issue that is nothing to do with TDD.

I note that the examples you give have trivial test cases - presuming you don't care about overflow, and if you do, then those methods now have basis paths, whether explicit or implicit depends on the language.

This is what I love about the design and operational philosophy of the Erlang/OTP evironment: the trickier class of harder-to-test/reproduce transient/timing pre-defensive-coding issues can sometimes be resolved automagically under supervision trees restarting said Erlang process. Log the stacktrace as a failure to maybe be fixed and move on. That's in addition to a robust soft-realtime, SMP VM which allows remote, live debugging.

As a newbie to Erlang, how does Erlang handle this case better for a service than a Python process which logs and restarts on failure? Also, this runtime testing strategy doesn't handle cases where you have other systems depending on that Erlang code path being successful, does it?.

If you absolutely want to cover all cases, you need to do mutation testing. A mutation testing system analyses your code, changes an operator (> becomes < or >= for example), and then runs your tests. If one test fails, your tests covered that statement.

It seems to me you need to have serious OCD to go for 100% mutation coverage, but that is what you really need to do if perfect coverage is your aim.

The other option is to accept that perfect coverage is not feasible, or possibly even a trap, and that you just need to cover the interesting bits.

Strictly, mutation testing can't assure all cases either -- combinatorial state explosions are fun. But it does improve assurance a lot.

How would automated mutation testing handle the case of accidentally causing infinite loops, or invoking undefined behavior?

I use pitest to run mutation coverage on most of my Java code bases. pitest implements a timeout to check for infinite loops introduced by changing the code.


Pitest has solved the halting problem?

In the real world, you can't wait five minutes for the server to return a response. If the server returns the correct response after ten minutes, it's still wrong, unless the programmer has explicitly acknowledged that the procedure in question is a long-running procedure and has lengthened (but not eliminated) the time-out accordingly.

The halting problem is an irrelevant, Ivory-tower distraction in production code.

Timeouts are not a perfect solution to the halting problem, but usually good enough.

There can be circumstances in which the substituted operator is just as valid as the operator being substituted for, resulting in false positives.

How would that be a false positive? If the substituted operator does not cause a test to fail, then your tests don't cover it. If the function of the program is not changed by changing the operator, then the operator does nothing and should be removed.

That's the idea at least. All focus on high coverage, whether line coverage or mutation testing, creates an incentive to remove redundant robustness checks, and maybe that's not such a good idea after all. But that's a problem with all unit testing, and not just with mutation testing.

I was thinking that, for example, that a >= test is as valid as == in some cases. If == is actually used and is correct, substituting >= would not cause any valid test to fail. As I wrote this, however, I realized that if you substitute the inverse operator (!= for ==, < for >= etc.) and it is not caught, you can infer it is not covered. (edit: or that, in the context of the program as a whole, the result of the operation is irrelevant. I imagine this might legitimately happen, e.g. in the expansion of a macro or the instantiation of a template.)

Substituting == for >= should cause a test to fail. Either == is correct, or >=. They can't both be correct. Should the two things always be equal? Then they shouldn't be unequal. Is one allowed to be bigger than the other? Then that should be allowed, and not rejected.

If this change doesn't cause a test to fail, you're not testing edge cases for this comparison.

(Of course that's assuming you want perfect coverage.)


  while ( j < JMAX) {
     i = callback(j);
     if( i >= FINAL) break;
If, in the specific circumstances of this use, callback(j) will always equal FINAL before it exceeds it, a test for equality here will not cause the program to behave differently.

The fallacy of your argument is that == is not an assertion that its arguments should always be equal; it is a test of whether they are at some specific point in the algorithm.

Which why I really love things like MSR's PEX[1].

"Pex uses a constraint solver to produce new test inputs which exercise different program behavior. "

So this is not lines, but I guess.. branches and numerical limits. Either way it's cool and I wish it was integrated somewhere. They've got a online demo thing http://www.pexforfun.com/


IntelliTest[0] is the productized version of PEX. The first release was in Visual Studio 2015 Enterprise edition.

[0] https://blogs.msdn.microsoft.com/visualstudio/2015/09/30/int...

Still a pity it's only included in Enterprise edition. We need a strong foundation of accessible tools for automated test input generation improving on the current state of the art fuzzers. See e.g. danluu's vision of "Combining AFL and QuickCheck for directed fuzzing" [https://danluu.com/testing/].

I only have one piece of code for which I can say true 100% coverage exists: a library that works with HTML/CSS color values, and which ships a test that generates all 16,777,216 hexadecimal and integer rgb() color values, and runs some functions with each value.

However, I don't run that as part of the normal test suite. It only gets run when I'm prepping a new release, as a final verification step; the normal runs-every-commit test suite just exercises with a selection of values likely to expose obvious problems.

Just out of curiosity... What is the point in testing all 2^24 possible color values?

The point is being certain I haven't missed an edge case somewhere.

The hard part is the percentage rgb() values, of which there are technically an uncountably infinite number (since any real number in the range 0-100 is a legal percentage value). For those I generate all 16,777,216 integer values, and verify that converting to percentage and back yields the original value.

Hmm...true 100% input coverage? Including the time domain? (Order dependency, Idempotence, etc...) :)

how do you know the test that generates those values is correct?

Python's test suite.

How long does that take?

Depends on the machine. See the comments in the file:


The fun part is people who criticize the generation of the integer triplets; yes, it's three nested loops and that's bad, but the total number of iterations will always be 16,777,216 no matter what algorithm you decide to use to generate them. So it uses nested loops since that's the most readable way to do it.

I don't see how that's a "tragedy", as in "an event causing great suffering, destruction, and distress, such as a serious accident, crime, or natural catastrophe".

You're also making it sound like somebody promised you that tests can prove the absence of bugs, when that was never the bargain and smart people have already told you so, probably before you were born even.

> Testing shows the presence, not the absence of bugs

Dijkstra (1969)

If we're being pedantic, a tragedy is _a drama or literary work in which the main character is brought to ruin or suffers extreme sorrow, especially as a consequence of a tragic flaw, moral weakness, or inability to cope with unfavorable circumstances_.

Definitely not a tragedy.

If we're being pedantic, there's a second common definition of the word tragedy that does not refer to a drama or literary work, but "an event causing great suffering, destruction, and distress, such as a serious accident, crime, or natural catastrophe."

I think when you take all of the time wasted on useless tests written merely for the sake of having tests, that waste is tragic. You could be doing anything else with that time.

I am curious if there is any good way to measure that amount of waste. I agree that waste from extra and useless tests exists, but I am not convinced that waste is entirely bad.

It seems to me that waste is unavoidable with teams newer to automated testing and may simply be part of the cost of using automated tests. If that is the case then it seems better to compare the cost of bugs with no automated to cost of superfluous tests. In that comparison extra tests definitely seems like the lesser of two evils, even without hard numbers. I would prefer hard numbers because my intuition could be wrong.

Well, if the organization tracks the time developers spend doing things, then it should be easy to estimate the waste.

1. Take the number of hours spent writing tests.

2. Multiply by whatever percentage of the tests are unnecessary.

3. Multiply by the labor cost per hour. Or revenue that was not made (e.g. if you could have billed those hours to a customer, but didn't).

The resulting number could either be a big deal or not, depending on how big your organization is and how much time you spent on superfluous tests.

I agree, I was just kicking GP's pedantry up a notch.

There's use in looking at the testing for industrial strength code e.g. compilers, DirectX, OpenGL conformance (god help you). With that background, I was confused by TDD to put it politely.

Code coverage should perhaps not be counted in lines, but in the number of type permutations covered. Then your challenge is determining all the types that _can_ be passed as inputs.

By types, I assume you mean categorically different types of data, rather than the types of the type system that your language recognizes?

I agree that test coverage does not ensure that the code is really tested. But not covered code is not tested code. So i use coverage in this way, to spot not tested code.

You can test stuff manually.

You really shouldn't, or if you do it should be an extreme minority of tests and automated tests should do the heavy lifting.

Automated tests can do so much more than manual tests that shops still living by manual tests either have a damn good reason or are just as wrong as people who argued against Revision Control Systems (or people who argued for gotos instead of functions).

Automated tests will be executed identically each time, so no missing test cases because someone slacked off or made a typo. Automated test can serve as examples for how to use the code. Automated tests can aid in porting to new platforms, once it builds you can find all the bugs your care about swiftly. Automated test can be integrated with documentation, tools like doxyegn and mdbook make this easier.

Automated tests enable Continuous Integration. Are you familiar with Travis CI or Jenkins? If not, imagine a computer that a team commits their code to, instead of directly to the mainline Revision Control (Git master, svn head, etc...). That computer builds the software, runs all the tests, perhaps on every supported platform or in an environment very close to production, then only merges commits that appear to fully work. This doesn't completely eliminate bugs and broken builds, but the change is so large that teams without it are at a clear competitive disadvantage.

When integrated into process tests can be used to protect code from changes. If a test exercises an API and the team knows that is the purpose then when they change things in a way that break the test they shouldn't... This sounds vague or obvious, but consider this: At Facebook and Google they have a rule that if it is not tested new code doesn't have to care if it breaks. Both companies have team that make broad Sweeping changes. Facebook wrote a new std::string and Google use clangtools to make automated changes in thousands of places at once. Even if code breaks or APIs change as long as tests pass these people can be sure that they negatively impacted the product and are following their team's rules.

Automated Tests can... This list could go on for a very long time.

Maybe you're thinking about some specific applications/pieces of code? Probably not a video game?

I think it should be a mix. I disagree with the "extreme minority" portion for most projects. There are times where a manual test is the right answer and there's exploratory manual testing.

Obviously though running through a long sequence of testing manually for every code change is crazy. And some sort of CI setup like you describe is a must for every project this day and age.

Then there's also the question of unit tests vs. system/integration tests...

I think that mutation testing should replace code coverage. It really ensure that the test verify the code behavior and not only invoked it.

The trouble with mutation testing is that most mutations will completely break an application.

Why not prove the code is correct instead? Should be much cheaper than 100% coverage and more certain.

Why do you say most mutations will completely break an application? This is certainly not my experience.

Mutation testing systems normal use fairly stable operators (e.g changing a > to a >=). In most locations in the code changes such as these will have only a subtle effect.

> even if you do all the work, and cover every line in a test, unless you cover 100% of your underlying dependencies, and cover all your inputs, you're still not covering all the cases.

On the other hand, if you do cover all your inputs, you've covered all the cases regardless of what % code coverage or path coverage you have.

Tests are not meant for ensuring it works with all inputs. You cant simply just throw values at it hoping its all okay.

To prove it works with all possible inputs, there are other tools at your disposal.

I'm reminded of a guy tasked with testing a 32 bit floating point library. After a number of false starts he realized the most effective way possible.

Brute force.

Oh other story. An OG (original geek) I know once proved the floating point unit on a mainframe was broken via brute force. Bad part meant the decimal floats used by and only by the accounting department sometimes produced erroneous results.

Me I have a pitiful amount of unit tests for the main codebase I work on. One module which does have unit tests, also had a bug that resulted in us having to replace a couple of thousand units in the field.

Otherwise most of the bugs that chap my keister aren't about procedural correctness, they're about state.

How do you test it by brute force? How is the algorithm supposed to know what the right thing to return is, unless you rewrite the whole thing, correctly this time? And, if you've done that, just replace the actual function with the test and be done with it.

One answer to that is to apply property based testing. If you're testing an addition algorithm, for example, you might test that

   A + B == B + A
   A + 0 == A
   (A + B) + C == A + (B + C)
   A + B != A + C (given B != C)

It's a good idea in principle, though your third and fourth properties are not true of floating point addition.

When you have one available, testing against the results against a reference implementation is a little more direct. Though, given how easy those property tests are, you might as well do both.

Your comparison source could be running on a different platform, or be slower, or ... and thus not be a drop-in replacement. (E.g. in the example of a broken floating-point unit, compare the results with a software emulation)

Hmm, true, it may have been a different platform/slower, thanks. It can't have been hardware/software, as the GP said they were testing a library (which implies software).

How is the algorithm supposed to know what the right thing to return is

Some times I have, for example, matlab code that I 'know' is right and want to verify that my C/Python/Julia... code does the same thing. So then I just call the matlab code, call the new code and check if they're the same.

Or I have a slow exact method and want to see if my faster approximation always gives an answer 'close enough' to the correct answer.

With floating-point math libraries especially, it's frequently the case that it's relatively easy to compute the desired result accurately by using higher precision for internal computations; e.g. you might compare against the same function evaluated in double and then rounded to float.

This sounds pretty sketchy at first, but it turns out that almost all of the difficulty (and hence almost all of the bugs) are in getting the last few bits right, and the last few bits of a much-higher precision result don't matter when you're testing a lower-precision implementation, so combined with some basic sanity checks this is actually a very reasonable testing strategy (and widely used in industry).

If a function takes a single 32 bit argument you can run it on every possible value.

… this reply does nothing to answer the parent's question. It's obvious that you can run it on all possible values, but that doesn't answer the question of the return values. (unless you only want to verify "doesn't crash" or something similarly basic)

I believe the question was edited after I wrote my answer. This, or I just misread it - unfortunately HN doesn't let us know. Anyway, I'm providing another answer to the question as it is now.

Validating the answer is different than providing it:

   FLOAT_EQ(sqrt(x) * sqrt(x), x) && sqrt(x)>=0.0
is a pretty good test for the square root even though it doesn't compute it.

What tools would that be?

I'd like to hear about real world test scenarios where all possible inputs are tested.

In practice, testing _all_ possible inputs is usually not a feasible goal, and often downright impossible (since it would mean solving the halting problem). While there are some languages and problem domains where formal verification methods are a possibility, for most of software development the best you can do is reducing solution space.

A sane type system would be a start for example. And it's no coincidence that functional programming and practices with emphasis on immutability are on the rise; Rusts ownership system is a direct consequence as well.

TDD _is_ important, if simply for enabling well-factored code and somewhat guarding against regression bugs. But - decades after Dijkstras statement (which someone has already posted in this thread) - the code coverage honeymoon finally seems to be over.

Polyspace [0] allows full state space checking. Usually, this is feasible because the kind of modules you test are heavily constrained 12 bit (or fewer) fixed point values (sensor output/ actor input) and contain no algebraic loops. With floats and/or large integers, this quickly becomes unwieldy (verification then takes weeks).

[0] https://de.mathworks.com/products/polyspace.html

One of them would be a fuzz testing. You start with a predefined 'corpus' of inputs which are then modified by the fuzzer to reach yet not covered code paths. In the process it discovers many bugs and crashes.

The input fuzzing process is rarely purely random. There are advanced techniques that allow the fuzzer to link input data to conditions of not covered branches.

It is quite useful mechanism for checking inputs, formats, behaviour patterns (if you have two solutions but one model, one simple that works 100% but is slowish and one more complex but very fast).

See: https://github.com/dvyukov/go-fuzz#trophies and http://lcamtuf.coredump.cx/afl/

Fuzzing is essentially the same as just throwing values at it hoping its all okay.

What I meant was what @AstralStorm said about mathematical and logic proofs that it works for all defined values.

Fuzz testing is useful at testing whole systems in vivo.

Proofs are powerful, but you can still make errors in the proof or the transcription into code. And of course you don't know what other emergent behaviours will surprise you when the proved code interacts with unproved code.

Depending on your means and needs, you want to try both.

Parent was talking about full coverage. While fuzzers are great tools, able to generate test inputs for easily coverable parts of the code, formally correct software has to be designed for provability.

Many embedded safety and medical applications prove correctness for all inputs and their combinations. Somme also verify error behaviour. (Out of range.) Granted, this is a relatively small input space, typically a few sensors.

There are a few relevant facts that should be known to everyone (including managers) involved in software development, but which probably are not:

1) 100% path coverage is not even close to exhaustively checking the full set of states and state transitions of any usefully large program.

2) If, furthermore, you have concurrency, the possible interleavings of thread execution blow up the already-huge number of cases from 1) to the point where the latter look tiny in comparison.

3) From 1) and 2), it is completely infeasible to exhaustively test a system of any significant size.

The corollary of 3) is that you cannot avoid being selective about what you test for, so the question becomes, do you want that decision to be an informed one, or will you allow it to be decided by default, as a consequence of your choice to aim for a specific percentage of path coverage?

For example, there are likely to many things that could be unit-tested for, but which could be ruled out as possibilities by tests at a higher level of abstraction. In that case, time spent on the unit tests could probably be better spent elsewhere, especially if (as with some examples from the article) a bug is not likely.

100% path coverage is one of those measures that are superficially attractive for their apparent objectivity and relative ease of measuring, but which don't actually tell you as much as they seem to. Additionally, in this case, the 100% part could be mistaken for a meaningful guarantee of something worthwhile.

Partly related: Is equivalence partitioning [1] used much in the industry in software testing?

[1] https://en.wikipedia.org/wiki/Equivalence_partitioning

I came across the concept some years ago, IIRC, in the classic book The Art of Software Testing by Glenford Myers [2], and used it in a few projects. But have not really heard or read of it being used much, from people or on forums.

[2] https://en.wikipedia.org/wiki/Glenford_Myers

I think it can be complementary to some of the other approaches mentioned in this thread, which, BTW, is interesting.

In the form discussed in [1], it gets pretty complicated when more than one operand is a variable, and they are derived from prior operations on the original data.

Alternatively, identifying equivalence partitions from the semantics of the input data faces the problem that a faulty program may create invalid partitions, so some test cases for a given equivalence partition pass, while others fail.

You raise some interesting points about issues with the method.

I had used the approach in your second paragraph above.

I agree that it can have the issue you mention. But is this not more or less the same as the issue that even test code can have bugs in it? But we still use test code. For that matter, even human testers doing manual testing can make mistakes. But we still do manual testing.

I agree that it is a valid technique, and in fact it is widely used: a minimal case is when we use a test account ID to stand for all accounts. The pessimist in me was looking for the exceptions, which is actually not a bad trait when you are testing.

> The pessimist in me was looking for the exceptions, which is actually not a bad trait when you are testing.

Agreed :) Good discussion.

Also: the fundamental undecidability of the Halting Problem

... is really not relevant to software testing.

If you actually know what the halting problem is, you know that if your code triggers that, it is pretty much broken. If you're not writing a programming language.

The worse the developer, the more tests he'll write.

Instead of writing clean code that makes sense and is easy to reason about, he will write long-winded, poorly abstracted, weird code that is prone to breaking without an extensive "test suite" to hold the madness together and god forbid raise an alert when some unexpected file over here breaks a function over there.

Tests will be poorly written, pointless, and give an overall false sense of security to the next sap who breaths a sigh of relief when "nothing is broken". Of course, that house of cards will come down the first time something is in fact broken.

I've worked in plenty of those environments, where there was a test suite, but it couldn't be trusted. In fact, more often than not that is the case. The developers are a constant slave to it, patching it up; keeping it all lubed up. It's like the salt and pepper on a shit cake.

Testing what you do and developing ways to ensure its reliable, fault-tolerant and maintainable should be part of your ethos as a software developer.

But being pedantic about unit tests, chasing after pointless numbers and being obsessed with a certain kind of code is the hallmark of a fool.

"The worse the developer, the more tests he'll write."

no.... please..... no..... let's not go down this path. Why does everything have to be extreme? No tests, 100% tests... it's all bollocks.

The most robust code I've ever written was 100% because of unit tests. It was a little engine that approved the prior authorization for medications. The unit tests didn't cover 100% of the application. In fact the only bit it did cover was the various use cases in the approval logic. The tests were invaluable in writing the logic, and continued to be invaluable for the maintenance of the project.

Tests are a tool. You don't need to use the tool for EVERYTHING, but sometimes it's the best tool for the job. Using it doesn't make you a bad developer, using it badly makes you a bad developer.

Everytime I've seen these arguments it's been the same:

* Bad coding practices lead to bad tests * Bad test practices make development very difficult, since you're constantly chasing your tail

They're invariably intertwined. Bad tests can be re-written. Bad code can be rewritten/refactored too, but it tends to have a larger impact on the product (you know, the thing that actually makes you money).

However, it seems that a lot of posters see bad tests and say that it's the problem, and therefore most tests are bad. They neglect that bad tests are a smell that what they're testing is probably shit. They're treating (condemning) the symptom, not the cause.

> The worse the developer, the more tests he'll write.

This is not only wrong, but also dangerous.

I agree with the second part of your comment (about being pedantic about the number of unit tests), but a good and comprehensive test suite is fundamental for large projects, especially projects with many moving parts and a large turn-over of people (basically any big enterprise).

By far the best codebase I ever worked with had zero test cases, was nearly 30 years olds, been worked on by not just several teams but several companies, had basically zero documentation, and was still easy to read and understand.

Unit tests only really catch a tiny fraction of bugs relating to defective code, not poor design, poor market fit, poor understanding of requirements, or any of the other bugs that most often make it into production. They can be a huge benefit to many possibly most projects, but there are no silver bullets, good quality needs competent people given sufficient time and resources and nothing really changes that equation.

> not poor design, poor market fit, poor understanding of requirements, or any of the other bugs that most often make it into production.

That's because unit tests aren't supposed to be a magical silver bullet that automatically fixes everything. Unit tests are like an exercise regimen. Just because you work your ass off in the gym doesn't mean that you'll get the result you want. If you're still eating a massive amount of calories, have inconsistent workout patterns, or do exercises incorrectly, you will only see minimal gains. Likewise, if your unit tests are not testing the correct logic, it's just an exercise in futility.

People make jokes about the infamous null test:

    Person p = new Person();
Yet there are a non-zero number of production code bases with those kinds of tests attached to them. And I firmly believe that this is one of the drivers of hatred towards unit tests: people see poor unit tests like the above, so they instinctively write off unit tests as a waste of time.

Just because someone uses a tool incorrectly doesn't mean that the tool does not have value. A spudger isn't going to drive a nail into the wall, but that's not what a spudger was designed for.

I worked on one such system (written in Ada, of all things) back when I was a fairly junior developer. I was hired to work on a major effort to port the system from SPARC/Solaris to RHEL/x86 and add some new features. It was designed by a crusty (and eccentric) old architect (who was still around to manage this major refresh project) who ruled with an iron fist, and the project was managed in classic waterfall fashion.

He was not afraid to look you in the eye and say: "Your crappy code is not going into my codebase until you fix X, Y, Z, and change your indents to match the style guide." There were no "unit tests", per se, only a test driver program that he had written himself.

He was a hard guy to like, but I had to admit, that system had a clean, coherent, and modular design the likes of which I haven't seen since.

That would be an ... interesting work environment. Probably the dev version of benevolent dictator, works great until the leader leaves, and the team is left with weaker control structures.

Yea, it was only a 9 month contract for me. I was definitely glad not to be a FT employee at that place :)

Not sure where you are going with this story. Do you mean that tests are useless, because you once worked in a company that thought that a large, test-less codebase was a good idea?

Hint: anecdotal fallacy

No, because I have worked with several dozen code bases, hundreds if not thousands of bugs, and Unit Tests are if anything a bad sign associated with code in desperate need of refactoring. They are the uncanny valley of quality.

Ed: Unit Tests become more valuable as code quality decreases.

Hmm... You're capitalizing "Unit Tests" like there's a need to distinguish the difference between those and what I infer is "code that was not Unit Tests that we ran to verify correctness". Am I correct? Are you drawing a distinction? Or did you truly have no code used to verify correctness? Or no automated way to test?

There are many types of tests, Integration tests are different than Unit Tests. Having a script click on UI elements is fundamentally different from isolating a function to verify it does what it should.

Survivorship bias.

What about all the shitty 30 year code bases with no tests and no documentation that got dropped because they were garbage.

It's not impossible to write good code without tests, it is just more likely that code with tests and docs will be good enough to survive 30 years. Your project with no tests or docs and was good might be 1 in a million, while software that is good with tests and docs might be 1 in 100.

The test that are written today will not cover future cases 30 years down the road. Code bases 30 years ago are not the same shitty code bases you see today. Programming was more difficult and required better programmers who wrote better code on average. TDD is needed today because the bar has been lowered and it acts as a safety net.

Maybe large teams are worse at development than small teams, so they need more tests.

> The worse the developer, the more tests he'll write.

As always, generalization is the tool of the fool (sorry for the fool part, but it rhymes ;) ). Writing pointless stubs / mocks and testing execution order of statements is definitely a bad pattern, writing many and good functional, e2e and integration tests however is not.

100% agree. Full code coverage with acceptance and integration tests is really good. If you can get a test case for every feature that your app is supposed to have you can develop quickly and ship often knowing that nothing will break. Extensive unit tests that don't test what features do but test how features are implemented are usually a waste of time in my opinion, unless there is some complicated and critical algorithm that you need more visibility and protection for. Otherwise the only purpose of those unit tests is to break for good refactors and double dev time.

When You are cleaning up tech debt and make hundreds of lines of changes and no acceptance tests break (because everything is still working as it should) but 50 unit tests break... I've wasted so much of my life

On top of that this statement will lead (some) new developers to just say "oh yeah I don't need tests because the gosus write only a few tests as well". Same pattern can be observed in DOTA or other games where newbies say "oh I don't need wards/intel because pro players don't need it either". The difference is that pro players already know what's going on.

Also worth a read while we're at it: https://www.linkedin.com/pulse/20140623171038-114018241-test...

Generalization is at the heart of science. The lack of generalization is one of the most frustrating attacks you can launch on a scientist's empiricism.

We are talking about software testing here.

To a first approximation, nobody is being empirical, and there is no science being done.

When you are writing tests, that's an empirical, rather than a theoretical approach to software correctness.

When programmers change a factor to "see what breaks", that is very much an empirical activity, and it is part of the programmer's theory-building of a phenomena.

If a young child takes a gear from a watch and observes its breaking, that is very much an empirical activity. It is also the beginning of theory-building.

You don't need MANOVA to engage in empiricism.

I should have been more clear, I see now my wording was ambiguous.

Approximately nobody is being empirical about what works well and does not in testing. Use this approach vs. that approach, do this, no - do this instead. It's all largely heuristic.

Individuals performing testing and debugging are usually at least most empirical, I agree (although occasionally the rubber chickens come out)

So the worst of us developers out there write tons of tests? We wish. Who cares if it's 80% or 100%, i'd rather inherit a codebase with lots of test coverage than without.

I wish to inherit code that is clear, concise and works over any tests thanks. Each to his own.

So you're saying test coverage is a negative? It's not an either or.

Test coverage is a plus but not more important. We have an end to end test system that runs on each commit. What I am saying is perfect code, simply written is way way more important than complete coverage. Tests, on my experience, don't make code better written or more concise. Given a choice between coverage and good code I choose good code. In my experience time is always short so I choose the time on good code. In practice this is a compromise anywhere but the most cashed up companies.

If I am understanding correctly, he is saying that well-written code requires fewer tests in order to be fully tested. Fully tested is better than untested, assuming that code is fully tested, well written code will require fewer tests.

That is my understanding, as it is rather hard to extract a solid argument from his posts.

He specifies "any tests", as do several other anti-test posts in this discussion, though often someone will reply to them and assume, like you have done, that they have another complex set of automated tests that they're happy about, and they're only railing against some unspecified subset of automated tests. I'm not sure if that's being over-charitable or not.

Not writing tests is a great way to ensure future developers won't refactor your code

Oh I completely agree, and would much rather be working on a codebase with too many unit tests than too few. I'm trying to come up with the most charitable explaining that I can.

Unit tests are a tool to get the codebase to that point... When used properly.

And what about test coverage for the tests? You do realize that tests are also code, right? They have bugs, maintenance overhead, and all the other things code has.

The worst code bases I've seen have more test code than code being tested. People try to write tests to cover every condition or input and in systems of even modest complexity that isn't possible, or at least not feasible in a finite time.

The actual code is the check on the tests. They cover each other, acting as though they are and there is some infinite recursion dishonest.

Tests do have a maintenance cost, but it is much lower than the maintenance cost of code without tests. Without manual tests if there a bug it must be caught manually, this costs human time every time the tests are run. That also presumes the humans do the tests correctly, have a new team, QA person out sick, what is the cost now?

The cost of not having tests is bugs in production. If I write code and there a bug that impacts production many millions of dollars are on the line, and this is true for many developers. Write software for any airplane? Crashes can causes crashes so they are dangerous and expensive. How about something more common, Write software software for an online store? If the shopping cart drops 1 in 1000 items that is a huge amount of money, not just because of lost sales, but also angry customers. You will want to run the tests at least a few thousands times to catch that, a human won't do this, an automated unit test suite is shell script and VM away from doing that.

Automated tests don't make this impossible but it reduces the amount of bugs that can make it through. If a test costs 10 hours of developer time at $500/hour and stop one bug that would have alienated 1% of customers. Then there would only need to be 500,000 customers worth $1 each. Clearly these numbers have insanely inflated costs, yet they still make sense for many businesses. And this is only accounting for one bug. Finding that bug did not consume the test, it is still ready for more. A good test can many bugs and last many years.

I don't disagree that tests are good and necessary. I take issue with the view that tests are somehow free, that more testing is better, or that tests are of such marginal cost that more effort developing tests than the product is considered a reasonable use of resources.

Also, the notion that "they test each other" is likely to be dangerous.

> Also, the notion that "they test each other" is likely to be dangerous.

About as dangerous as double entry bookkeeping. Of course it doesn't provide any absolute guarantees, but having people state things in multiple contexts and checking their consistency is one of the better approaches we have for finding errors.

The worse the developer, the more they sure they can write perfect code which doesn't need any tests because it's perfect and will be perfect even after future changes, even after changes in related code - absolutely perfect code.

How about SQLite? Hardly anyone could claim that it was built by the team of "bad developers". However: "the SQLite library consists of approximately 122.9 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in other words, lines of code excluding blank lines and comments.) By comparison, the project has 745 times as much test code and test scripts - 91596.1 KSLOC."

I've had good experiences here and bad. The absolute worst was when I took over a project that a contract firm had developed and had to work side by side with them for a few months.

They were a 100% test shop. The site itself was broken, certain parts of it would literally crash the web server for certain users because the code was so terrible...but they had tests for everything.

What's worse, when fixing some of the broken code they got mad at me for making it harder to write tests...because somehow in this mindset a working product was less important than working tests. I've been wary of test nazis ever since.

This hits home with me. Incompetent developers need dogmatisms (100% unit coverage!!!! etc)

No, inexperienced developers need dogmatisms. Subtle difference perhaps, but it's worth remembering that every competent developer started out being inexperienced, and that it's nothing to be ashamed of.

No. We fetishize experience too much. I've known a great many competent inexperienced developers who could weigh tradeoffs and use the right approach to a given task, and a far too many highly experienced developers who destroyed productivity through dogmatism.

Productivity is not the sole measure of competency, nor necessarily even a very good one. Productivity suffers when time is spent making software robust, secure, and performant, and the reverse is obviously also true. Likewise, short term productivity sometimes suffers when effort is spent to ensure long term productivity. Again the reverse also holds true. The problem is that short term productivity is often the only measure of competency that is immediately visible. Nobody gets recognized for the problems they don't cause.

You are correct. I was once inexperienced and was very dogmatic.

> The worse the developer, the more tests he'll write.

This is obviously wrong.

> Tests will be poorly written, pointless, and give an overall false sense of security to the next sap who breaths a sigh of relief when "nothing is broken". Of course, that house of cards will come down the first time something is in fact broken.

That's a generalization that does not apply everywhere.

It's rather apparent to me anyway, what the OP meant. If I see someone expending an enormous effort writing tests, I'd probably conclude the same, unless I had knowledge of their expertise, in which case, I'd respect their judgement. The OP meant to draw a line which in their mind, distinguishes an expert from a novice. I don't see why so many people are getting all bent out of shape for.

>This is obviously wrong.

What makes it obviously wrong? It seems to be a personal opinion.

>That's a generalization that does not apply everywhere.

Does a comment necessarily have to contain statements that apply to every single person in every single situation at every point in time? I think we'd have to delete pretty much all comments on this website then.

If we strike your first sentence, "The worse the developer, the more tests he'll write", then I can mostly agree with you.

It sounds like you have never worked on a project with no tests and those same developers that write "write long-winded, poorly abstracted, weird code that is prone to breaking". A test suite doesn't eliminate bad code or bad coders, but it does give the better people on the team a handle to grab them by.

> The worse the developer, the more tests he'll write.

Some of the worst developers I've worked with never wrote any tests. Like zero tests. Actually in one case negative number of tests; the engineer deleted tests I wrote because they were failing the build because of this engineer's (breaking) changes.

Every good developer I know writes tests.

I inherited a code base with LOTS of test coverage.

Made one small change and everything broke.

Looked into the tests, oh the humanity!!!

Dropped the test suite entirely, haven't had any issues since.

We have VERY good logging and notifications though so anything goes wrong we know all about it.

> Dropped the test suite entirely, haven't had any issues since.

That you know about.

That bad test suites exist is unarguable.

Bad production code exists too, but we don't give up on writing production code.

You know all about it at run-time.

I knew a guy who used to wire down safety features on old gas stoves. He said similar stuff about how unnecessary these were. He died (along with his daughter) a year or two later because he disregarded what ski areas were and weren't off limits. Something he'd no doubt done before without any problems. You're fine until you aren't.

Why not write tests that make sense then?

It would definitely help in maintaining the code (not to mention not having to go through the entire code base when debugging a production issue).

Any causal phenomena has some number of factors which it depends upon. A test ought to provide large insight to the user with respect to which factors are the most likely culprits to a specific expression of the phenomena, and they also ought help a programmer determine the number of factors at play.

If a competent programmer cannot quickly determine how many factors are in play, or if that programmer cannot constrain the number of relevant factors to a humanly manageable number, then I question how insightful the sum of tests were.

Completely disagree. The worst developers don't even care if their code ends up breaking later; as long as it gets past QA now, that's all that matters.

> The worse the developer, the more tests he'll write.

Spare us your platitudes. Most developers don't decide the code coverage, the manager or the client does.

The manager is hopefully also a developer, and listens to his team Else you have a manager that sets arbitrary quality metrics that the dev team doesn't agree are useful then - and that's a way bigger problem.

In the case of contractor work where a customer actually buys the code and not just the functinoality - it's very tricky. I have never done contractor work so I'm not aware of how contracts are usually written. How do you set a quality metric? I'd be much more comfortable to agree to third party to judge the quality than to have an arbitrary metric in a contract for e.g. ratio number of comments to code lines, the average number of character in symbol names or the percentage of lines of code covered by tests.

The tragedy of 100% code coverage is that it's a poor ROI. One of things that stuck with me going on twenty years later is something from an IBM study that said 70% is where the biggest bang-for-the-buck is. Now maybe you might convince me that something like Ruby needs 100% coverage, and I'd agree with you since some typing errors (for example) are only going to come up at runtime. But a compiled (for some definition of "compiled") language? Meh, you don't need to check every use of a variable at runtime to make sure the data types didn't go haywire.

The real Real Tragedy of 100% coverage is the number of shops who think they're done testing when they hit 100%. I've heard words to that effect out of the mouth of a test manager at Microsoft, as one example. No, code coverage is a metric, not the metric. Code coverage doesn't catch the bugs caused by the code you didn't write but should have, for example. Merely executing code is a simplistic test at best.

70% coverage... I don't know if I'm going to be able to give up my 0% coverage 1-man projects...

I'd recommend aiming for 10-20% on those projects, and also for startups trying to rapidly push an MVP out.

Tests have diminishing returns. You want to hit the absolute most crucial ones that give you plenty of bang for buck and even save you time. That means finding the (usually small handful) of functions that implement your most crucial and most complicated business logic, and writing tests for them.

Anything past that is for companies with customers that need to maintain a certain level of quality and service. Worry about it when you get there.

Agreed. If you're a small start-up in a hurry and <4 devs, the best strategy would be to write top-down smoke tests on your API that will actually tell you when things break, even if not exactly where. Their actual derived coverage might be pretty high.

As long as that 10-20% is the more complex bits, and you are capable of picking out which bits are complex, it's IMO better to do that and spend the time saved on other shit that increases maintainability and reliability. Killing tech debt, killing your teammates, whatever.

How do you know what 10-20% to test?

This sort of basic level of decision-making for testing is something I wish I had, but all the tutorials and guides are about 100% code-coverage TDD so it's hard to find a path to to learn reasonable, high ROI testing.

You should test the thing that makes you money first, and should delay testing supporting functionality. For instance, if I am writing a document converter, the thing that makes me money is the AST -> AST conversion. Testing that should come before testing parsing (bytes -> AST) and rendering (AST -> bytes).

The place where you make money is the place that will have the largest demand for new and changing functionality. And where things change the most is where you need tests to protect against regressions.

Since what to test is a sliding scale, determining when and what to test is perhaps something that comes with exposure to good testers..

For me, a few references I use are "tests are a thinking help for specifying behavior", "tests hold behavior in place" and "test until you feel comfortable".

The first is a note that TDD thoughts are written by developers that write APIs, not by developers that write applications. TDD is a fantastic tool for an API designer, because they force you to think about the experience of using the API. So, whenever I design APIs, I like TDD. This is also a good argument for why you should minimize setup code that "goes behind the scenes" - if you're testing, say, a REST API, do as much of the setup and assertions as you can via the REST API as well!

The second helps me remember to think "a year from now, are all the behaviors this code needs to have obvious, or is someone likely to unintentionally break it?". I try to write tests that will flag if someone broke an important edge case - or the main use case! Tests can be used as the programmers equivalent of a carpenters' clamps, kind of.

The third is why I don't write as many tests anymore. I normally try to write one workflow-oriented feature test up front, like "When a user creates a new invoice, then that invoice should show up in the users list of invoices with the values they entered, plus x,y,z auto-generated parameters". As I implement the feature, if I come across a piece of logic that makes me feel uncomfortable - lots of branching, or code that's very important that it stays intact - I'll write a unit test or two to hold that in place; sometimes I don't write any unit tests, meaning I'd have written just one or two tests over the course of two or three days of implementation.

It varies wildly on the type of application you're building. I can only speak for front-end development of complex SPAs w/ React.

Generally, most architectures in this domain have a combination of UI components, a data store, a set of update logic for the data store, and a set of asynchronous controllers that respond to events, interact with APIs, and call the aforementioned update logic.

In React the UI components are declarative, they (generally) contain no logic or algorithms, just a mapping from state to DOM. I see basically zero value in testing these. Bugs are almost always of the 'forgot to actually implement' variety, or are related to the way the page is rendered in a particular browser, rather than the DOM output the components are responsible for.

The data store update logic is usually either simple setters/getters (which don't need testing) or complex data transformations (which do).

The controllers also come in simple and complex varieties. Simple ones (one API call, one data store update once it's resolved) don't need testing. Anything more complex than that probably does.

So those are the two main targets for testing in the apps I build. I generally don't bother with anything else.

There are exceptions though. For example, here's an accordion UI component I built which relies on an asynchronous manual DOM update after the React DOM update has finished resolving. This could almost definitely use tests, if only to help any maintenance developers understand what it's doing.


Basically, as long as you have some sort of sane architecture, there should only be a few potential targets for testing, and they should be easily identifiable.

Uncertainty. Look for the parts of the code where you feel you least sure you have the logic right.

A simple example of this is the bowling game kata: given the throws in a bowling game, calculate the (final) score. The 'hard' part is to keep strikes and spares in mind, including the bonus throws when a strike or spare is scored in the 10th frame.

If you were making an application that would help you keep track of your bowling score, that score calculator would have the highest ROI in terms of testing.

I've been working on a 1-man project with maybe 5% test coverage, just for some critical libraries that I ended up refactoring a few times. There's actually one little library that has 100% test coverage and super detailed error messages, because that's where many of the bugs seemed to happen.

I also have a simple integration test that just clicks through everything and makes sure nothing crashes.

Not having a lot of tests can be painful. Especially when you're learning new languages or frameworks, you almost always want to go back and rewrite some code (or just reorganize it into different files), and it's really nice to have tests when you do that. So sometimes that gives me the motivation to start writing up a bunch of tests, and then after that I dive in and refactor everything.

Being able to scale-up is interesting, both human logistics-wise and machine performance-wise, but the ability to scale down is also interesting.

Doesn't it also depend on th application?

I mean I wouldn't settle for less than 200% test coverage on an automatic pilot for landing airplanes. If it is a one off script that happen to become a part of a temporary business process perhaps just a sample set of data, a desired output and small tool for comparing the results is enough.

My main issue with unit testing is what defines a unit?

Throughout my career I find tests that tests the very lowest implementation detail, like private helper methods, and even though a project can achieve 100% coverage it still is no help avoiding bugs or regression.

Given a micro service architecture I now advocate treating each service as a black box and focus on writing tests for the boundaries of that box.

That way tests actually assist with refactoring rather than be something that just exactly follows the code and breaks whenever a minor internal detail changes.

However occasionally I do find it helpful map out all input/output for an internal function to cover all edge cases. But that's an exception.

I agree with you. That's called functional testing, and it is very useful, but it is not unit testing.

Unit testing: Test all methods and paths of a class, even private ones.

Functional Testing: Test the public api of a class/service only. If something is wrong internally, it will be caught without having to write countless of little tests.

ROI of functional testing is high, as it is usually done with real data. In my opinion unit testing is a huge waste of time. Most of the tests devolve int mock objects, calling mock methods, and doing this that really don't help to find real world bugs, where two unit tests pass, but their methods produce the wrong output.

Some counterpoints:

- If you want to know if your utility classes and functions are sane, unit testing is far better bang for your buck than trying to figure out whether they're being adequately exercised in your service tests.

- If you're trying to figure out which part of a complicated system broke, having unit tests that break on the specific module, or class, or method can be quite helpful.

- Yes, integration tests can be mocked up to look like real world data, and of course you can even feed them real data. The flip side is that their data requirements can be heavy, and they can be quite cumbersome to set up.

I think any test strategy – unit, integration, e2e, acceptance, UI – pursued to exclusion is a bad idea. Different projects and different teams call for different balances between them.

> - If you want to know if your utility classes and functions are sane, unit testing is far better bang for your buck than trying to figure out whether they're being adequately exercised in your service tests.

I think we talk about the same thing, sometimes I will test something internal.

I've thought about what I'm doing as considering larger "units" in my unit testing, but perhaps "functional testing" as parent introduced is what I'm advocating. I see this as distinct from integration testing. In a micro service architecture, my integration test would be chaining multiple services together into test scenarios.

I'm kind of flexible on my terminology, perhaps more flexible than others, since my approach to testing has largely been self-taught from the experience of several shops.

To me, unit tests are when I'm testing the behavior of a "thing" in "isolation", for some definitions of those terms – and yes, I agree that those definitions vary greatly. However, I don't agree with you that the flexibility of the definition is an issue.

Integration testing for me is when I'm testing system interfaces, so I'm focusing on how my systems behave when they come together. Sometimes I do this in isolation, using stub services or even internal stubs to simulate another system's behavior, but generally I do it with an actual system when it's convenient.

I don't generally use the term functional testing because I personally think it's ambiguous – I'm testing functionality in either case! But I suspect our differences really just boil down to how you slice it. If you prefer to divide tests into black-box vs white-box, but don't care as much about the specific level of isolation involved in the test, functional testing is perhaps the term you'd prefer. I prefer to categorize tests in terms of the amount of isolation I'm using, in which case I'm basically thinking unit, integration, e2e.

"If you want to know if your utility classes and functions are sane, unit testing is far better bang for your buck than trying to figure out whether they're being adequately exercised in your service tests."

This is actually where doctests work pretty well as they both document and test.

"If you're trying to figure out which part of a complicated system broke, having unit tests that break on the specific module, or class, or method can be quite helpful."

I don't find tracking down bugs with a repeatable test case to be much of a problem (events in live systems are a different story). It becomes even less of a problem if you sprinkle some assertions around the code that block off invalid code paths.

"Yes, integration tests can be mocked up to look like real world data, and of course you can even feed them real data. The flip side is that their data requirements can be heavy, and they can be quite cumbersome to set up."

Building mocks in unit tests is usually even more cumbersome and tedious.

For overly large sets of real world data I've had some success taking live database snapshots and cutting out 95% of the data before using the cut down to size dump for testing.

> even private ones.

You don't test private API, that's why they are declared private, it makes absolutely no sense to test private members.

Now the use of the private member might lead to a different code path : it has to be tested.

Unit Testing : test one unit ( a class for instance ) in isolation, which means all collaborators (the classes the tested class depends on) have to be stubbed or mocked.

Functional Testing : test multiple unit at the same time to check the behavior of an entire functionality, that's why it is called functional testing. Stub the IO.

End-to-end testing: Test the entire system with the IO.

> it makes absolutely no sense to test private members

Don't agree. I think this is something that people tell themselves to justify the anti-pattern that is private members being nearly impossible to test directly in a variety of languages, but it seems like nonsense. I've never seen a real case for excluding private methods as a testable unit.

> I've never seen a real case for excluding private methods as a testable unit.

Here's the reason:

We're not writing tests; we're writing SPECIFICATIONS.

Private methods are implementation details; they're not part of the specification.

That's potentially one philosophy on it, sure. Yet that still doesn't preclude private methods from being an independently testable unit. Implementation details are the meat of the whole thing, and so they seem inherently testworthy. The glue code itself, less so.

Thanks for that thought though, definitely something I'll think on :)

You should be able to get the code coverage you need (even if 100%) by testing the public API of any class. If this isn't true, it means you have some private methods you can delete.

That's how the unit you're testing is used - through it's public API. The public API is the specification of how it works.

By only testing the public API, you allow yourself maximum ability to refactor in the future, while still maximizing code coverage. It means that simple refactoring (inlining methods, for example) won't break tests. More importantly, a failing test means something is wrong that is potentially relied upon elsewhere. If you test private methods, you will get test failures without the public API of the unit having changed at all.

Your point about "each function is a unit" is fine, you can justify testing private methods with that - but it's inefficient. If it's not necessary to be in the spec (public API), why have you made it so? You're over-complicating the design by locking yourself into implementation details in places where you don't need to.

In my experience I've found that testing private methods directly is a code smell. It shouldn't be necessary.

> That's potentially one philosophy on it, sure.

That's not potentially one philosophy, that's the definition of unit testing.

> Yet that still doesn't preclude private methods from being an independently testable unit

Then these methods ain't private as they break encapsulation. You can't have it both ways. Either a method is private or it isn't.

> Implementation details are the meat of the whole thing

But unit testing isn't about testing implementation details, it's about testing a specification that your API must respect, because that's the behavior collaborators consuming that API rely on. If your collaborators can call a private API then that API isn't private at first place.

I think the problem lies with the definition of "unit" which you quote above. As with designs stacking on top of one another at various levels, one level's functional test is often the next level's unit test. Your app's unit test might mock the very same things that are covered by the functional test for an underlying library. That library's unit tests might mock the very same things that are covered by the functional tests for one of its own dependencies, including the OS. And so on, even down to the level of functional units within a chip. (Chip verifiers have a lot to teach us software types about this kind of thing BTW, and I know because I've worked with a few.)

The distinction I always try to make is whether you're testing a contract or an implementation. If you're testing a contract, ROI is likely to remain high even as you move to finder granularities. If you're testing an implementation, so that the tests fail even when the implementation is 100% correct, then ROI falls off a cliff.

That's exactly what happens when the "unit" gets too small. If certain private methods can't be called a certain way because of constraints imposed by the class's public methods, then testing those calls is testing things that can't happen. That takes time away from testing (and implementing fixes for) things that can happen. If the class's contract changes so that those internal methods can be called in new ways, then yes, a 100% code-coverage unit test might catch the error. So, in all likelihood, would the new functional test accompanying the public-method change. The delta between the two, times the likelihood of such a scenario occurring in the first place, is too small to justify the cost of writing those tests and the likelihood that they'll generate false negatives.

Apprantly people misunderstand a unit. Apprantly Kent beck meant unit to mean a single piece of behaviour, not classes or methods. https://www.thoughtworks.com/insights/blog/mockists-are-dead...

> Apprantly Kent beck meant unit to mean a single piece of behaviour

Which is just as ambiguous. Personally I stick with the single assert principal (which may be more than one literal assert) so that whatever I'm asserting is the behavior that this test is verifying.

Sandy Metz's talk abou what should be tested and that should not: https://www.youtube.com/watch?v=URSWYvyc42M

I think that depends on the code base, I agree with your points for more procedural code bases, but for functional or OO code you should be able to test most of your logic without mocks or wiring too much up.

You usually still have to mock your data layer.

>Unit testing: Test all methods and paths of a class, even private ones.

I disagree with this. Unit testing means testing the "unit" (i.e. the class/object) but still at a boundary level: the public API of that class. You should mock out any dependencies that object would have to ensure you are only testing that class but the class API has to be respected.

A private method is a hidden implementation detail, testing those would be overfitting your tests to the current implementation meaning if you change one character anywhere in your source you will almost certainly have to change one or more tests. Plus, if your tests are so tightly coupled to the implementation, it's likely to suffer from any bugs the implementation does causing them to be hidden (never forget that tests are also code and therefor have bugs at a similar rate to any other code).

Writing test code requires the same level (if not more) of engineering discipline that writing the code does.

>ROI of functional testing is high, as it is usually done with real data.

I also disagree with all of this. Functional testing is a kind of sanity check that the parts actually work together once assembled. If you have proper unit test coverage (and properly designed/engineered tests!) the functional testing is basically checking configuration. The problems with function testing are (1) testing is about checking code paths but which paths functional tests take can be hard to predict and differ between subsequent runs, making it hard to make any statement about what passing actually means. (2) is exactly what you mentioned as a positive: people tend to want to use what they call "real" data, i.e. data they have actually seen before. Which means it's probably only good for catching bugs they know about, not ones they've never seen before.

>In my opinion unit testing is a huge waste of time.

I would actually agree to this because I think answering the question of "how do we detect bad code" with "write more code" is problematic. I'd rather go the Haskell/Idris route and be able to prove that I have no bugs.

>Most of the tests devolve int mock objects, calling mock methods, and doing this that really don't help to find real world bugs, where two unit tests pass, but their methods produce the wrong output.

This kind of response sounds like a self fulling prophecy. The exact point of mocking is to test code paths, i.e. if this dependency returns this result how will the class under inspection behave in response. Ideally you would use mocks to test everything that every dependency could respond. Unless the types involved have very few inhabitants, this isn't generally possible (even programatically) but the closer you get to this the more you can trust your tests.

> However occasionally I do find it helpful map out all input/output for an internal function to cover all edge cases. But that's an exception.

An example of that exception: just yesterday I wrote tests for a serialization/deserialization utility, that translates between an object tree and a wire format used by our client. There was some tricky code around the "deserialize message into one of many possible objects" part, so I wrote a bunch of unit tests for the whole thing.

I find myself writing unit tests mostly for this kind of code - complex logic that transform data and/or execute "advanced" algorithms on it. I also write tests as close as possible to the boundaries of the complex logic - this way, when I'm sure the logic itself is sound, all integration bugs tend to be trivial to notice (it means someone fucked up the inputs or didn't handle outputs properly).

The more I think of it, the more I notice that I tend to structure my programs in a functional (as in functional programming) way - lots of "services" that take things as inputs and return them as outputs, without using any external state. So e.g. that serialization/deserialization service mentioned only takes a String as input, and returns a reference to a base-class object as an output (or the other way around for serialization). Making code conform to functional style makes it not only easier to test, it helps avoid some tests entirely.

> breaks whenever a minor internal detail changes.

This is usually where I draw the line for what to test. I don't care how a unit does what it does, I only care about what it does.

In my opinion a unit should have minimum side effects, none if possible, and the results should depend on the inputs. Write the test to the contract of what the caller should provide and caller will get back (and then any side that had to happening, like checking logs.)

I think a bigger epidemic is we're putting too much emphasis on "do this" and "do that" and "if you don't do this then you're a terrible programmer". While that sometimes may be true, much more importantly is to have competent, properly trained professionals, who can reason and think critically about what they're doing, and who have a few years of experience doing this under their belt. Just like other skilled trades, there's a certain kind of knowledge that you can't just explain or distill into a set of rules, you have to just know it. And I see that in the first example in this article, where the junior programmer is writing terrible tests because he just doesn't know why they're bad tests (yet).

It seems to me that management is taught that a dependence on expensive experts, is a problem to be optimized. They want to manage the development process in a way that allows them to easily swap out one developer or team for another, or to ramp up production simply by increasing the number of developers assigned to a task.

It is almost as if they see the success that we have had in dev/ops with the "cattle, not pets" philosophy, and want to apply that in their own field. Making the subject behave consistently and predictably, whether it is a machine or a human professional, would be a prerequisite for that.

Will they succeed?

Sure, as long as their users unanimously abide by a very large rulebook ;)

I'd say they are "succeeding" in the sense that they continue to make money. Whether they're truly successful in the sense of providing proper value to their customers to justify that money, and treating their programmers right, is another story, and I've seen enough cases where they don't do either of these, while the executives make plenty of money to live comfortably for years by exploiting both customers and their own programmers.

That's why I'm highly skeptical when I hear the word "best practices".

Sure, the intention is good, but it promotes mindless repeating of patterns over thinking about what really helps.

On the other hand, every discipline that I can think of has its own set of best practices. Why should software development be any different? I know a lot of people are susceptible to the mindless repeating, but that's not a fault of best practices.

I'm not against the idea per se, just highly skeptical when I hear it used, because more often then not, I've seen it used to promote mindless repetition of previously used patterns.

We've almost stopped unit testing. We still test functionality automatically before releasing anything into production, but we're not doing a unit test in most cases

Our productivity is way up and our failure rates haven't changed. It's increased our time spent debugging, but not by as much as we had estimated that it would.

I won't pretend that's a good decision for everyone. But I do think people take test-driven-development a little too religiously and often forget to ask themselves why they are writing a certain unit test.

I mean, before I was a manager I was a developer and I also went to a university where a professor once told me I had to unit test everything. But then, another professor told me to always use the singleton pattern. These days I view both statements as equally false.

I'm curious - how old are the systems you're working on?

In my experience, unit tests don't catch many bugs when the code is fresh. But when it's five years old with many modifications over the code base, some dumb little test that you thought was a waste of time is now alerting you to what would have been a horror regression.

In other words: Even though it feels like it's slowing you down now, if you write tests while the functionality is still fresh in your mind, and you capture your assertions well, they'll pay off dividends in the future once you've forgotten what some block of code does.

I work in the public sector in Scandinavia and some of our oldest systems still in service run on an old tandem. So some of it is pretty old.

This gives us some unique abilities in terms of modeling our productivity of course, because we started measuring before anyone thought up unit testing.

Over the past 15 years, unit testing has failed to produce anything positive, and test driven development has been an absolute disaster.

That being said, this isn't something which will be universally true. A lot of the software we use isn't build by us, and I'm certainly that a lot of the suppliers on that software use unit tests quite extensively.

For the things we build ourselves, however, there has been almost no value in adopting modern test philosophies.

You say that systems have to be able to be worked on 5 years from now, but the truth is that most of our systems transport data and rarely live 5 years without getting rewritten to deliver better performance, higher levels of security or simply because the business has changed completely. A lot of it hold very few responsibilities as well, making it extremely obvious what to fix when a service breaks.

Don't get me wrong, we've seen problems we wouldn't have with 100% coverage. But that doesn't matter when spending resources fixing them is still a net positive on every account.

Do you have any metrics that you can share?

Yeah good luck explaining those 5 years to the business manager.

At one place I work, the target is 80% rather than 100%. Seems to be a saner target than full coverage.

I agree (mostly) with the authors standpoints, but his arguments to get there are not convincing:

> You don't need to test that. [...] The code is obvious. There are no conditionals, no loops, no transformations, nothing. The code is just a little bit of plain old glue code.

The code invokes a user-passed callback to register another callback and specifies some internal logic if that callback is invoked. I personally don't find that obvious at all.

Others may find it obvious. That's why I think, if you start with the notion "this is necessary to test, that isn't", you need to define some objective criteria when things should be tested. Relying on your own gut feeling (or expecting that everyone else magically has the same gut feeling) is not a good strategy.

If I rewrite some java code from vanilla loops-with-conditionals into a stream/filter/map/collect chain, that might make it more obvious, but it wouldn't suddenly remove the need to test it, would it?

>"But without a test, anybody can come, make a change and break the code!"

>"Look, if that imaginary evil/clueless developer comes and breaks that simple code, what do you think he will do if a related unit test breaks? He will just delete it."

You could make that argument against any kind of automated test. So should we get rid of all kinds of testing?

Besides, the argument doesn't even make sense. No one is using tests as a security feature against "evil" developers (I hope). (One of) the points of tests is to be a safeguard for anyone (including yourself) who might change the code in the future and might not be aware of all the implications of that change. In that scenario, it's very likely you change the code but will have a good look at the failed test before deciding what to do.

The article illustrates what happens when you have inexperienced or poor developers following a management guideline.

To see how 100% coverage testing can lead to great results, have a look at the SQLite project [1].

In my experience, getting to 100% takes a bit of effort. But once you get there it has the advantage that you have a big incentive to keep it there. There is no way to rationalise that a new function doesn't need testing, because that would mess up the coverage. Going from 85% to 84% coverage is much easier to rationalise.

And of course 100% coverage doesn't mean that there are no bugs, but x% coverage means that 100-x% of the code is not even run by the tests. Do you really want your users to be the first ones to execute the code?

As an anecdote, in one project where I set the goal of 100% coverage, there was a bug in literally the last uncovered statement before getting to 100%.

[1] https://www.sqlite.org/testing.html

Doesn't what you're writing actually influence whether 100% coverage is a worthy goal or not?

I mean, SQLite is a good example of something where 100% coverage would actually be useful, because it tries to maintain compatibility with the SQL spec and with Postgres (largely because Postgres complies with the spec). Testing that 100% makes a lot of sense.

Suppose you're instead doing something very UX/UI driven. Why bother trying to cover that 100% with automated tests, when change is going to be driven by the whim and fancy of anyone who sits down in front of it?

> Doesn't what you're writing actually influence whether 100% coverage is a worthy goal or not?

Yes, absolutely.

> Suppose you're instead doing something very UX/UI driven. Why bother trying to cover that 100% with automated tests, when change is going to be driven by the whim and fancy of anyone who sits down in front of it?

It might make perfect sense from a business standpoint to have no tests at all.

I might be completely wrong on this one, but it seems to me that a lot of the precepts of TDD and full code coverage have a lot to do with the tools that were used by some of the people that popularized this.

Some of my day involves writing Ruby. I find using Ruby without 100% code coverage to be like handling a loaded gun: I can track many outages to things as silly as a typo in an error handling branch that went untested. A single execution isn't even enough for me: I need a whole lot of testing on most of the code to be comfortable.

When I write Scala at work instead, I test algorithms, but a big percentage of my code is untested, and it all feels fine, because while not every piece of code that compiles works, the kind of bugs that I worry about are far smaller, especially if my code is type heavy, instead of building Map[String,Map[String,Int]] or anything like that. 100% code coverage in Scala rarely feels as valuable as in Ruby.

Also different styles make the value of having tests as a way to try to force good factoring changes by language and paradigm. Most functional Scala doesn't really need redesigning to make it easy to test: Functions without side effects are easy, and are easier to refactor. A deep Ruby inheritance tree with some unnecessary monkey patching just demands testing in comparison, and writing the tests themselves forces better design.

The author's code is Java, and there 95% of the reason for testing that isn't purely based on business requirements comes from runtime dependency injection systems that want you to put mutability everywhere. Those are reasons why 100% code coverage can still sell in a Java shop (I sure worked in some that used too many of the frameworks popular in the 00s), but in practice, there's many cases where the cost of the test is higher than the possible reward.

So if you ask me, whether 100% code coverage is a good idea or not depends a whole lot on your other tooling, and I think we should be moving towards situations where we want to write fewer tests.

IMO ruby is the worst because its so easy, and often times encouraged, to write layers of useless tests that ultimately give only a false sense of security. I recently re-did the Michael Hartl tutorial to catch up with Rails5 and it gives examples where you test proper template rendering. I love that book but I'd rather shoot myself in the foot then spend dev time writing tests to check if the right html title attribute was rendered for the page...

Sometimes the right test is just occasional manual testing. And sometimes it's even just the fact that no users have complained about something obvious not being completely wrong.

Curious as to why you're blaming a language (ruby) for a bad practice (writing layers of useless tests). Excuse my pedantry, but it wouldn't it be more accurate to blame a culture that you feel has grown up around that language?

In Scala, "does it compile/typecheck" will generally catch typos in code that isn't frequently executed, and raise warnings for unhandled cases. For Ruby, you pretty much need to execute the code to have any inkling if it goes kaboom -- the language (and how people write it) have so much magic that short of executing things, the computer can't tell you anything useful.

Valid point.

I don't think I could write low-bug count code faster in statically typed scala than I could in unit tested ruby though. I mean I am well aware of what you're telling me, and it's obvious to me that having the compiler automatically check certain properties is a win. And yet, when push comes to shove, to get something done I'm more likely to reach for ruby.

It's something I've never come up with a good explanation for. Does static typing stunt prototyping and exploration? Do unit tests capture high level goals better?

I guess I don't agree with the assertion that ruby tests are useless. Because you test higher level things than you do with scala types.

"Does static typing stunt prototyping and exploration?"

Not in my experience. It is probably about what you personally find hard/uncomfortable/unfamiliar.

But you're kind of implying that I feel that way because either I don't use static typing much, or I haven't learned it well, aren't you?

I've used static typing much more than dynamic. I'll admit confusion on things like ocaml polymorphic variants and haskell monads, nevertheless I wouldn't say I find static typing hard as a rule. But surely the point of static type checking is to constrict what you can do for safety and performance reasons. And surely the cost of that is you have less freedom - even when in the exploration phase

"you're kind of implying that I feel that way because either I don't use static typing much, or I haven't learned it well, aren't you?"

No. I didn't mean to imply or assert anything of the sort. Apologies if I inadvertently gave offense. You seem to be reading meanings into my reply that aren't there.

I don't know you from Adam. You asked a question in your post. I answered as best as I could.

That said, your latest statement

"I'll admit confusion on things like ocaml polymorphic variants and haskell monads,"

does seems to imply that you don't have much real world experience with static type systems in production (nothing wrong with that) since neither is an arcane concept or particularly difficult to understand.

If you haven't worked extensively with Haskell/Ocaml/SML etc, and you are extrapolating properties of 'static type systems' from those of Java or C++ then your idea of such type systems 'stunt prototyping and exploration' might make sense.

The rest of your comments are extrapolations from misunderstandings - not born of practical experience. My answer was based on (strictly) personal experience. Which is why I said "in my experience". I gladly concede that YMMV.

again, I was just answering your question in your original comment. I didn't mean to "imply" anything and used "probably" to mark my uncertainty about your real world experience with Ocaml/Haskell etc style static type systems.

I have extensive experience with both dynamically typed languages (mostly Python, Lua and Scheme) and statically typed languages (mostly Haskell and SML, besides Java). I answered out of my experience.

YMMV. And that is cool.

I'm not offended. I am happy to talk to open minded static typing advocates. I myself am not really sure where I sit.

You're right, I haven't used ocaml or haskell in production. I did use F# though, which seems to be in the same ballpark as those languages in terms of having algebraic data types and inference and all the rest. I suppose the fact that I still don't really grok polymorphic variants after reading the real world ocaml a few times may say something about my ability, motivation, or at the very least how my brain is wired.

Fundamentally though, a type system is a restriction meant to help the programmer. This restriction must inevitably put you down a certain road when you explore stuff, right?

It depends on how often that freedom is used for a benefit rather than by accident and thus foot shooting.

It changes my approach to a problem, in that, with dynamic typing, I tend to think of problems as manipulating data, while with static typing I am modeling a domain, deseriazing data to it, and serializing the result. The end result usually ends up being a bit more verbose, but at the same time does a better job of holding my hand when the input format changes in the future. It's definitely a trade off though.

If you can achieve >90% testing, efficiently and properly. It's worth it. Testing for the correct title is worth it. Especially if the title has something appended to it at render time (i.e. hello | website_name). Someone can over ride this on a page and having a test to catch that is important. A well written test would check for page_name | website_name across all pages. Pages are finite. Such tests are useful. Hartl uses basic testing framework as an introduction but there are better ones available.

But remember nothing is free, nothing is a silver bullet. Stop and think.

I'm going to be the one to point at the elephant in the room and say: Java. More precisely, Java's culture. If you ask developers who have been assimilated into a culture of slavish bureaucratic-red-tape adherence to "best practices" and extreme problem-decomposition to step back and ask themselves whether what they're doing makes sense, what else would you expect? These people have been taught --- or perhaps indoctrinated --- that such mindless rule-following is the norm, and to think only about the immediate tiny piece of the whole problem. To ask any more of them is like asking an ostrich to fly.

The method names in the second example are rather WTF-inducing too, but to someone who has only ever been exposed to code like that, it would probably seem normal. (I counted one of them at ~100 characters. It reminds me of http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom... )

Many years ago I briefly worked with enterprise Java, and found this sort of stifling, anti-intellectual atmosphere entirely unbearable.

It might even make sense for many of those enterprise developers to have a set of best practices that they can blindly follow. Not every developer has that many years of experience or is sharp enough to be able to identify the cases where unit testing makes sense, and those where it doesn't.

Method names got my eyebrows up as well and I spend a lot of time in Java. I should consider myself lucky.

The big error being made in this article (and most of the comments here) is the assumption that the purpose of unit tests is to "catch bugs." It isn't.

The purpose of unit tests is to document the intended behaviour of a unit/component (which is not necessarily a single function/method in isolation) in such a way that if someone comes along and makes a change that alters specified behaviour, they are aware that they have done so and prevented from shipping that change unless they consciously alter that specification.

And, if you are doing TDD, as a code structure/design aid. But that is tangential to the article.

The big error being made in this article (and most of the comments here) is the assumption that the purpose of unit tests is to "catch bugs."

More like "in countless articles, comments, talks and projects over, say, the last decade".

Yes - it's pretty frustrating to try and have an important discussion when 99% of the time it is framed incorrectly from the start.

Unit tests are a poor substitute for correctness. Many unit tests does not a strong argument make.

Unit tests are typically inductive. Developer shows case A, B and C give the expected results for function f. God help us if our expectations are wrong. So, you're saying since A, B and C are correct therefore function f is correct. Well that may be, or maybe A, B and C are trivial cases, in other words, you've made a weak argument.

100% test coverage sounds like lazy management. Alas, the manager may have worked their way via social programming rather than computer programming. In such cases, better to say you have 110% test coverage.

Not sure if this is already mentioned but for me the most concise illustration of this fallacy was in The Pragmatic Programmer book. They had a function like this:

double f( double x ) { return 1/ x; }

They pointed out that it is trivial to get 100% coverage in test cases but unless your tests include passing in 0 as the parameter you are going to miss an error case.

Exactly, or to put it another way: you are not forced to cover the code you didn't write; in this case, the non-zero check before the division.

Mix this with unchecked exceptions, mocking of any real-world interaction that will generate errors in productions, and testing becomes a cargo-cult of quality.

passing 0 to f returns infinity (IEEE 754)

Not arguing the point here. It's just a terrible example.

You are both correct and wrong: I want X/0 to immediately crash (fatal error) before the bad data that caused my to try to divide by zero gets propagated farther. Now that I think about it, I really want X/Y where Y is "close" to 0 to crash too.

Of course I made the above up on the spot, but it is a reasonable thing to do for some situations. Those who know floating point math are well aware that dividing by something close to zero tends to result in very large errors, vs the true answer. (particularly if the division is part of a larger calculation)

I thought you'd get NaN, not Inf. Of course I'm probably misremembering.

I think what the above returns depends on the language. In java I think there is an actual run time exception for this. The point of the original comment is to show how 100% coverage is easy to achieve but often meaningless.

There is some other more abstract concept related to the classes of input that can possibly be passed to a method. IMHO 100% coverage and "test first" have done more harm than good to the cause of automated testing.

/me raises hand on the pro-testing side

I've been programming for a living since 1996, and only recently started to do TDD in the normal sense of writing unit tests before writing code. I've found to it to be an enormous help with keeping my code simple - the tests or the mocking getting difficult is a great indicator that my code can be simplified or generalised somehow

I argued for functional instead of unit testing for years, but it was only when a team-mate convinced me to try unit testing (and writing the tests FIRST) that the scales fell from my eyes. Unit testing isn't really testing, it's a tool for writing better code.

BTW from an operational perspective I've found it's most effective to insist on 100% coverage, but to use annotations to tell the code coverage tool to ignore stuff the team has actively decided not to test - much easier to pick up the uncovered stuff in code review and come to an agreement on whether it's ok to ignore

I agree that TDD and unit tests can help you when writing code. But once your code has been written, are all those unit tests still useful? Or do they get in your way when you do large scale refactoring?

I have seen quite a few codebases with loads of unit tests that might once have been useful, but now were just slowing down development. Many of them should have been thrown away and replaced by a couple of functional tests, however the sunk cost fallacy always kicked in, nobody wanted to throw away all the hours of work that had gone into making them and not to forget the precious code coverage as well.

> But once your code has been written, are all those unit tests still useful?

The reason for this question is the naming mistake that was made in the beginning. They are not TESTS, they are SPECIFICATIONS.

Once your code has been written, are the specifications still useful? Well, duh... unless the business has changed them then yes, they're still useful.

> nobody wanted to throw away all the hours of work that had gone into making them

This is a REALLY common developer attribute that drives me nuts - people don't want to throw away stuff that they put a lot of effort into.

+1 - speaking as an ex-sceptic, the only people I know that don't like unit testing and/or TDD are the people who either misunderstand their purpose or have not learned how to utilise them properly yet.

Once TDD "clicked," the quality of my code shot up very rapidly indeed.

I wish people cared more about the craft of an amazing plugin architecture or an advanced integration between a machine learning system and a UI, but no, more and more of our collective development departments care more about TDD and making sure things look perfect. Don't worry about the fact that there are no integration tests and we keep breaking larger systems, and while there might be 100% code coverage, no developer actually understands the overall system.

Yes, unit tests have their place, but you have to understand why they are needed.

To me, it's secondary and has tones of bike shedding in it. Writing tests is easy, mentally. Getting a good, simple, YAGNI/DRY architecture is more challenging and requires several iterations, something that is resisted if you have to rewrite all your unit tests. Let me put it this way, if you write an architecture that you later don't like, you would be more hesitant to scrap it and start over because all the additional work of the unit tests (especially if you're on a deadline). That's bad. To write a good architecture, a developer must be willing to realize he could have done it better and be willing to tear it down instead of build around the bad architecture. That's how you write 20 year code, which should be almost everyone's goal.

You may find a long rant I wrote recently interesting.


I've seen projects where management had rules like "you must have 70% code coverage before you check in". Which is crazy, for a lot of reasons.

But the developer response in a couple cases was to puff the code up with layers of fluff that just added levels of abstraction that just passed stuff down to the next layer, unchanged, with a bunch of parameter checking at each new level. This had the effect of adding a bunch of code with no chance of failure, artificially increasing the amount of code covered by the tests (which, by the way, were bullshit).

I got to rip all that junk out. It ran faster, was easier to understand and maintain, and I made sure I never, ever worked with the people who wrote that stuff.

If you can prove that your testing process is perfect, then your entire development process can then be reduced to the following, after the test suite is written:

  cat /dev/random | ./build-inline.sh | ./test-inline.sh | tee ./src/blob.c &&
  git commit -Am "I have no idea how this works, but I am certain that it works perfectly, see you all on Monday!" &&
  git push production master --force
When presented like this, relying on human intelligence and experience doesn't seem like such a bad thing after all.

Just so we're clear, my username was not inspired by this scheme.

It seems to me that such a "perfect" testing process would basically amount to declarative programming.

Indeed it is; a perfect test is equivalent to the code being tested.

Well...SQL is declarative, and we still see "Select * from Users"

Several other people are complaining about 100% coverage sometimes being misleading. One way to test your test is to randomly modify your code e.g. swapping a greater than for a less than. If your tests still pass, then they obviously missed this change in behaviour.

This is known as mutation testing.


I've tried it a few times, I generally found it too slow to be an everyday part of my routine, but interesting tool to have in your belt for e.g. evaluating test in a codebase you didn't write yourself.

Have you ever tried it with one of the automated tools? I use Pitest extensively to mutate my Java code bases and found it is not too much slower then regular line and branch code coverage tools when you use a history file so only diffs are needed. The trade off of a few more seconds of build time is worth the benefit of the coverage report and being able to fail the build if coverage drops too low for me.


A lot of people here seem to have strong opinions against 100% coverage, so I'll risk their ire with my strong opinion in favor.

If you have, say, 95% coverage -- and most corporate dev orgs would be thrilled with that number -- and then you commit some new code (with tests) and are still at 95%, you don't know anything about your new code's coverage until you dig into the coverage report. Because your changes could have had 100% coverage of your new thing but masked a path that was previously tested; or had 10% but exercised some of the previously missing 5%.

If you have 100% coverage and you stay at 100% then you know the coverage of your new code: it's 100%. Among other things this lets you use a fall in coverage as a trigger: to block a merge, to go read a coverage report, whatever you think it warrants.

Also, as has been noted elsewhere, anything other than a 100% goal means somebody decides what's "worth" testing... and then you have either unpredictable behavior (what's obvious to whom?) or a set of policies about it, which can quickly become more onerous than a goal of 100%.

It's important to remember that the 100% goal isn't going to save you from bad tests or bad code. It's possible to cheat on the testing as well, and tests need code review too. There's no magic bullet, you still need people who care about their work.

I realize this might not work everywhere, but what I shoot for is 100% coverage using only the public API, with heavy use of mock classes and objects for anything not directly under test and/or not stable in real life. If we can't exercise the code through the public API's then it usually turns out we either didn't rig up the tests right, or the code itself is poorly designed. Fixing either or both is always a good thing.

I don't always hit the 100% goal, especially with legacy code. But it remains the goal, and I haven't seen any convincing arguments against it yet.

Open the flame bay doors, Hal... :-)

The problem with the "100%" goal is that it provides a false sense of safety. There can still be plenty of bugs in code that has "100%" test coverage. The 100% are an illusion. To truly get to 100%, it does not suffice that every line of code is executed, you must also test all valid inputs - which is utopic.

I concede that for people who don't understand what unit testing does, 100% might provide a false sense of security. But I don't think that's a problem with the goal... that would be like saying the problem with staying fit is that it provides a false sense of longevity, whereas you could still get brain cancer or be hit by a car.

If you know what's going on -- and hopefully you do if you're making decisions like whether to pursue a 100% coverage goal -- then your point is (I hope) self-evident.

It's also why code review is necessary and not just the coverage metric. 100% coverage is not "100% of all things that could happen," it's "100% of all code execution paths" and (hopefully) a sample of all known types of valid and invalid inputs.

Anyway, so far I've found that most developers respond pretty well to the 100% idea if you explain its utility and what it does and doesn't get you. From a ridiculously small sample, granted.

The utopian risk, I think, is much more around the hope that people high up the food chain in your organization actually understand this stuff. Consider how many VPs of Engineering did a year or two of engineering work (before unit testing was in vogue) then got that MBA and have been busy taking meetings ever since.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact