Hacker News new | comments | show | ask | jobs | submit login
Why Most Unit Testing is Waste [pdf] (rbcs-us.com)
306 points by henrik_w on Mar 6, 2014 | hide | past | web | favorite | 268 comments

From the paper: "Throw away tests that haven’t failed in a year."

Hell, no.

Our unit tests provide reassurance that when somebody revisits a component N years from now, and makes a change, they are significantly less likely to break the existing behaviour, even subtly.

Throwing away unit tests is like saying "This ship has been sailing for years and has never sunk - let's throw away the lifeboats!"

Now, yes - somebody could erroneously change the test condition to make it pass, although hopefully that kind of change would be spotted by even a cursory code review. You can say the same thing about developers who carelessly suppress compiler warnings without understanding what they're telling them. However, the support is there in case I, or another developer, make a mistake.

FWIW, catching future mistakes is definitely not the only thing for which we use Unit Tests. The problem domain is hard, and for every test fixture we've written we've found at least one bug. Catching and debugging these bugs from the unit test runner is a lot easier than spotting and debugging these kinds of issues at runtime.

I think it'd be impossible to know which tests haven't failed. What if they failed on a developer machine because he introduced a regression, but it was fixed prior to checkin?

Your CI build thinks the test hasn't failed, but instead it was performing an important job.

Those tests probably are a good place to start if you think you have useless tests, but it's simply a heuristic.

"When I was programming on a daily basis, I did make code for testability purposes but I hardly did write any unit tests. However I was renowned for my code quality and my nearly bug free software. I like to investigate WHY did this work for me?"

Here's why: anything that focuses attention during development improves quality. Unit tests can be that thing but that means that their value is in a place that Cope is not describing.

See: http://michaelfeathers.typepad.com/michael_feathers_blog/200...

A good design is going to provide more value over time than a good test suite. A lot of developers spend a lot of time writing tests for terrible design on prototype codebases. Those prototypes eventually are thrown into production full of bugs and the answer tends towards "more tests!" instead of "better design!"

Most developers just keep building on top of the same codebase until it is a total mess, long after the team has learned a bunch and should refactor or throw away the code and build something designed for production use.

There is really not enough discussion about when he right time to do thorough testing and when a project transitions from prototype to production quality.

I realize things like software design, planning, and thoughtfulness are heresy in the agile world of 2 week sprints, but most of these things are our own fault due to a lack of planning and craftsmanship. Complain all you want, but in general it's our own fault.

100% agree. The problem is # of tests and test coverage are measureable. Good design isn't (at least not in the short term). There is something to the argument that thinking about testability helps with design but in practice this often does not help.

Yet another problem is there is never time to throw away and build something new. In the life-cycle of most companies/products the available time decreases as a function of time since now you have customers, you have bugs, you have a fatter organization to feed, etc. The time to get the design right is the beginning and you have to do that fairly quickly...

> Most developers just keep building on top of the same codebase until it is a total mess, long after the team has learned a bunch and should refactor or throw away the code and build something designed for production use.

That's exactly where a good test suite comes to save you. If you have tests you can refactor as much as you want until you get the code you want. If you don't you will never have enough time to refactor the code base because the effort of manually testing will be too high. And as result your code will rot over time.

Yes, but if you are leaning on your test suite to verify a huge refactor, you are unlikely to make the kind of sweeping design changes that are sometimes needed.

For example, say your ORM is causing major trouble and you need to replace everything from the controller back to the database. Heck, maybe you decouple the whole backend out into an API. The only tests that help you are the end to end tests. The rest of your test suite for your backend needs to be thrown away as much as your backend itself does.

More often than not the unit tests will have assumptions about the design in them so they will actually stand in the way of refactoring rather than facilitate it. If you have class A with unit tests this is basically a design decision reflected right there. If having class A is bad design fixing that requires dealing with tests + class A and the tests are a hindrance rather than help.

I'm back in Java-land these days, which is culturally very pro-unit testing. After getting exposed to it again for a few months again I've come to side with the author here. I've never really been comfortable with the amount of time certain people dedicate to unit testing, especially the TDD crowd, but in my hiatus something has arisen in popularity which has made it all the worse: mockito.

Prior to mockito, unit testing was (more or less) limited to testing that your methods behaved as expected, and would occasionally expose NullPointerExceptions or other exceptional conditions. Dependent objects were either simplified or simply ignored. With the rise of mock object frameworks, however, your tests specifically say "this method on this mock will be called X number of times, with this result". Mind you, this is all happening in the context of another method call. So, for example, if you were testing the method "calculateDueDate", and that method took a DateTime object, your test might look like this:

        public void setup() {
            MyClass myClass = new MyClass();
            DateTime mockDateTime = mock(DateTime.class);

        public void specificDueDateShouldBeTenDaysFromNow() {
            DateTime result = myClass.calculateDueDate(mockDateTime);
            verify(mockDateTime, times(2)).getHour(); // contrived
The problem with this is that the tests become obstacles in the way of refactoring the code. Should you decide that you don't want to use the DateTime library any longer you will have to not just change the code which is using it but the tests as well. Or what if, going back to the example above, you decided not to use the getHour() method any more? Every test referencing that will have to be changed. And changing those tests is very likely to be more involved than changing the code under test, because there frequently are more tests than code. This has a negative impact on the design of the application. Because companies rarely dedicate resources to making existing software better purely for its own sake, you tend to have to do what you can when you can. This means your time is limited to make that refactor, or upgrade that library, or do whatevever change it is that needs to be done. Unit tests, especially those that use mocks, can get in the way of this to such an extent as to make such efforts impossible.

I think testing is important. I do not, however, share the belief that is the sole, or even primary, determinant of code quality. In fact, an over reliance on unit testing can easily be a net negative. Should unit tests be thrown out? No. Baby with the bathwater and all that. But they should not be viewed as a silver bullet, either. They're not. They can help, but they can hurt.

I've had two experiences with unit testing recently that have made me a believer.

One of them was that I was working on a team where a programmer quit and I had to get a very complex codebase ready for production. The last programmer was terrible, the kind of guy who had trouble making primary keys that were unique, where any code that could possibly have a race condition did, and so forth. The code had unit tests, however, and that made it salvagable, and eventually I got the system to a place where it worked correctly and the customers loved it.

In nine months of effort on this, I ran into one refactoring where it felt the unit tests were a burden, and that involved about a day of work rewriting the tests. Unit tests are more likely to be a problem, however, when they add to the time of the build process. For instance, I wrote something in JUnit that hammered part of the system for race conditions, and this was key to fixing races in that part of the system. It fired off a thousand threads and took two minutes to run, and adding two minutes to your build is a BIG PROBLEM, particularly if anybody who wants to add two minutes to your build can do so and if anybody who wants to remove two minutes from the build is called "a complainer" and "not a team player". Overall the CPU time it takes to run is more likely to be a problem than the developer time it takes to maintain them.

As for Mockito I have found it is a great help for writing Map/Reduce jobs. As I don't own a big cluster and as I sometimes like to code on the run with my laptop, an integration test typically takes ten minutes with Amazon Elastic Map/Reduce. It takes some time to code up tests, but I get it all back with dividends because often I get jobs running with two or three integration test cycles instead of ten or twenty. When I find problems in the integration tests, usually I can reproduce them in the unit tests and solve them there.

Now, it did take considerable investment to get to the point where unit testing worked so well for me. I used to have problems where "the tests worked" but the real application didn't because Hadoop reuses Writable objects so if you just pass a List of objects to the reducer, you might get different results in a test than you do in real life. Creating an Iterable object that behaves more like Hadoop does solved that problem.

Generally if you are feeling that "unit testing sucks" or "mockito sucks" it's often that case that you're not doing it the right way.

> Generally if you are feeling that "unit testing sucks" or "mockito sucks" it's often that case that you're not doing it the right way.

Well explain further. I hate these smart arse sounding comments - "you are doing it wrong" without any indication why, or how to do it better.

My sense is that one should make the unit tests to be as resilient as possible to refactoring and changes. This means that for so long as the public behavior of the class does not change, one should not need to do much if any updates to the tests.

Thus any test that is written in such a way that it would be present an issue in refactoring code should be avoided if at all possible. A simple example is directly constructing the class under test in the test method:

  public void tryToAvoidDoingThis() {
     MyClass = new MyClass(param1, param2);
     // do stuff to my class
If this is done for each test method, when the constructor parameters change, e.g. a new one is added, then each of the constructors calls in the test method(s) will have to be updated.

Instead, have a level of indirection and have a single method that can create a sample MyClass. Now when the parameters change, only one construction site has to be updated.

In general, unit tests should not be testing specific / internal implementation details of the class. Rather, the tests should verify the documented public behavior of the class.

That's the crux of good tests. That they test the unchanging interface between parts of code, without enforcing no change in implementation.

Interestingly it's also the crux of good design. Breaking code up into composeable pieces that aren't braided together.

There's an inconsistency here: unit tests should depend only on the public behavior of a class; the constructor is public; constructor calls should nevertheless be avoided where possible.

Factoring out a common constructor in tests is an example of making the tests resilient against changes in the underlying code. If the constructor changes, the tests need to be fixed in one place, not in 50.

Other examples may be around a `setup` method. If the method is private, don't test it. Then you can refactor freely. If it's public, test the pre/post conditions around the method in as few places as possible (hopefully one). Even if other tests rely on the object having been "setup", just trust that it works. If the specification of `setup` changes, you just have the `setup` tests to update, not the entire object.

Like all process you have to do what works for you. First, to steal from the recent airbnb article, the bar for testing has to be so low you trip over it. The testing framework should make it easy to get down to writing tests.

Second, start writing tests to verify bug reports and then fix the bug. In large systems I find this critical to honing in on the exact problem. The mental exercise of crafting the test to trigger the bug helps me really understand the problem.

Finally, start new features by first writing a test to simply drive your new codes golden path. When working in a large system I find writing a test to run my new code a much faster development turn around time than rebooting the entire system. This is compounded when there are many systems or moving parts which is common today.

I love TDD, but that's classic No True Scotsman.

> Well explain further. I hate these smart arse sounding comments - "you are doing it wrong" without any indication why, or how to do it better.

With unit tests, there are certain things that must be tested, such as very high-value code contracts, and the like. There are many things that people test (like "correct output" for the input) which may not be so valuable, particularly if several possible values may be correct.

So test contracts, not internals, and not representation. And please for the heaven's sake, don't test the behavior of your dependencies.

Same here.

Unfortunately I also believe those comments are generally true, and I also believe when the posters answer "why", they will give you an answer that is also doing it the wrong way.

I have no idea how to do unit testing. I only believe there is a right way.

The last programmer was terrible...The code had unit tests, however, and that made it salvagable

This doesn't make sense, how could you trust his unit tests?

Because if they're falsely passing, hopefully between the code, and tests you could gleam more intent than just through the code itself

Exactly. Even poorly written tests are far better than none at all.


By reading the unit tests and code comments, and comparing them against bits of the actual codebase, you can gain a better understanding of what the previous programmer was thinking and what he was trying to accomplish.

That's the most obvious benefit of having tests for me. Documentation can be outdated and even if it isn't it very rarely contains examples of use. Passing tests are for me exactly that. An up to date example of how the thing should be used and what can I expect from it.

In the old system a lot of things worked right.

For better or worse, this system had a number of data-centric objects; these objects passed many correct tests, but they failed to be deserializable from XML (because of the way collections were handled) as well as having other deficiencies.

The tests meant I could fix those deficiencies quickly and have faith that I fixed them correctly. Of course, I added new tests to test that the system did the things it had to do.

So... Unit Tess are good because, if you take over a code base written by someone who is incompetent, and if that person wrote terrible unit tests which should have been failing (but are passing, due to said incompetence), you have slightly more information to work with when fixing the original code? Seems like a weak argument to me.

In those cases, you will accept any help you can get. Trust me.

Sure, but it's hardly a reason to write unit tests.

I write unit tests for every one of my core algorithms. I have also seen a massive amount of dumb tests written by TDD people who strive for 100% code coverage. Ridiculous. It takes as much time to maintain the (often useless) tests as it does the code. Where they make sense they're indispensable.

Agreed. It's not a reason to dismiss tests though.

I think the fundamental question is why 100% code coverage is important. The fact is that it isn't. the problem with TDD is that a lot of people who do it totally get the idea of why you are testing entirely wrong. The goal should never be to "ensure your code functions properly." The coal should be to "ensure the code contracts are adhered to." If you test with that in mind, you will write lots of unit tests and almost never have to rewrite or delete them due to fixing bugs.

Again this comes back to what I said in another comment that tests should never be written to the code. Once you get that, then code coverage ends up being meaningless and not something you want to worry about.

Weak only if denial against unit-testing is too strong.

This is probably not best practice, but I often disable tests that take a really long time. An alternative would be to have a 'full suite' and a 'fast suite'. The fast subset could be used locally for most development, but then you run the full superset when you are ready to release. A 5 minute release is no big deal, but if it takes 5 minutes to run a standard dev test, then people are not going to test as much.

I also disable tests by default that require a working installation to run. This allows me to have a test suite that can be run prior to installation and a larger support test suite that can pinpoint problems on production systems.

It is ok to have one fast suit of tests that are run all the time and one slow but more detailed that runs only at some checkpoints (overnight, weekly, before release).

Big projects over some size normally do it that way.

If you've got a slow test suite, that mean's you've got a bad test suite. Taking away tests from it to make it faster takes away from its purpose, which is to aid you in refactoring.

Or it means you are testing something that's computationally expensive. Not everything is just web model input validation--some people are doing real work. :P

Not nessasarily. I've worked on math heavy programs where single calculations could take seconds to run. For the frequency they come up in actual use, this was not a problem, but our tests needed to run these calculations more than any single execution of the program would likely need to.

Or, it just means you have integration tests mixed with your unit tests.

Oh? Consider http://randomascii.wordpress.com/2014/01/27/theres-only-four... .

More specifically, consider an SSE2-based function 'float32 floor(float32)'. There's only about 4 billion inputs, so why not test them all? That only takes a minute or so.

How is testing 100 inputs a unit test and testing 4 billion inputs, through exactly the same API, an integration test?

As the author points out, many people wrote libraries which are supposed to handle the entire range, but ended up making errors under various conditions, and even given wrong answers for over 20% of the possible input range.

That's not how logic works.

The statement was very causal.

> If you've got a slow test suite, that mean's you've got a bad test suite.

Or you have a fantastic test suite that also includes integration tests. See?

Is 90 seconds to test a function "slow"? What about 4.5 minutes to test three functions?

If you say it's slow then either it's a bad test suite, and/or it includes integration tests. I believe that is the logic, yes?

There is no lower unit to test, so therefore this must be a unit test.

The linked-to page shows that testing all possibilities identifies flaws that normal manual test construction did not find. Therefore, it must be a better test suite than using manually selected test cases, with several examples of poorly tested implementations.

(Note: writing an exact test against the equivalent libc results is easier to write than selecting test cases manually, and it's easier for someone else to verify that the code is testing all possibilities than to verify that a selected set of corner cases is complete.)

Therefore, logic says that it is not a bad test suite.

Since it contains unit tests and it is not a bad test suite, therefore it must not be slow.

Therefore, 4.5 minutes to unit test these three functions is not "slow".

Therefore, acceptable unit tests may take several minutes to run.

That is what the logic says. Do you agree? If not, where is the flaw in my logic?

How can you have a good test suite without integration tests? That's not a full test suite. That's a cop-out.

A good test suite has two qualities - how comprehensive it is and how fast it takes to run. If either is lacking, then it is no longer a good test suite.

It's quite easy to have slow tests that aren't integration tests. For instance, there's some tests in Sympy that are only a few lines of code that run very slow because the calculation is difficult. Sometimes (but not always), it's trying to calculate a very difficult integral (which is a test of integration, but not an integration test).

Or it just means you have tests which could be better optimized for speed but in fact are optimized for something else.

We had a series of tests (more towards integration tests I guess) at one point in LedgerSMB that did things like check database permissions for some semblance of sanity. These took about 10 min to run on a decent system. The reason was we stuck with functionality we could guarantee wouldn't change (information schema) which did not perform well in this case. Eventually we got tired of this and rewrote the tests against the system tables cutting it down to something quite manageable.

We had this test mixed in with db logic unit tests because it provided more information we could use to track other failures of tests (i.e. "the database is sanely set up" is a prerequisite for db unit tests).

Heavy computation algorithms. My main focus is on geospatial analysis, and to test certain things, you are going to end up with some 1000ms+ tests. Get 10 or 20 of those, and you have a problem.

> Generally if you are feeling that "unit testing sucks" or "mockito sucks" it's often that case that you're not doing it the right way.

Either that, or the person just hasn't been sufficiently burned by someone changing something you're not aware of and having to track down a run time error for days that could have been caught and fixed by a unit test in minutes.

I read the article, and much of what he speaks of is tautological unit tests - testing something where someone could never have done anything but that. I've seen people unit test the behavior of filling up a collection then test to make sure the collection has all of those elements, for instance. And there, he has a point.

But there's a dangerous line there. And while the article makes a good point that too much of that type of testing can be detrimental, I've generally found it better to err on the safer side of more tests.

On your own project, or even on a small team, you can probably get away without them much of the time. But, on larger projects, where many different developers sometimes go back and make changes to code they didn't write, it's very easy for a new guy/gal to make changes with unanticipated consequences. When that occurs without sufficient test coverage, the project will wind up spending 10-20x more man hours to fix the issue.

Different between unit tests and integration tests. Individual unit tests should be on the order of a millisecond or less so you can rip through them very quickly. If you use Surefire and Maven, name tests with the suffix ITCase (can override) and then you can either run the unit tests or the integration tests with mvn test or mvn integration-test.

http://randomascii.wordpress.com/2014/01/27/theres-only-four... tested all 4+ billion inputs to ceil(), floor(), and round(), and pointed out that many libraries actually had high error rates, because of incorrect support for rounding, small values, NaN, and -0.0.

Each test is extremely fast. Testing all 12+ billion cases takes 4.5 minutes. Are these unit tests or integration tests?

If they are integration tests, roughly where is the boundary between the two?

Is it meaningful to distinguish between "unit" and "integration" testing based only on the amount of time they takes? That is, if the unit tests take 0.1 second too long then do they suddenly become integration tests?

I call those unit tests.

I think the problem is that definitions that should reflect the granularity of what is being tested have become too interconnected with assumptions about frequency of testing / time to run and it has created completely mangled definitions.

The problem is so bad that some developers I've crossed have the firm opinion that mstest should only be used for "unit" tests. It is frustrating.

If you have slow unit tests, don't run them on every compile. If you have fast integration tests, run them as much as you'd like. I've personally never defined unit tests as "things that run fast", despite that being a valuable property it does not seem essential to the definition, but perhaps I have the wrong understanding of what a unit test is.

12B test permutations is not your typical scenario, though 4min is pretty damn quick for all that. I'm asserting that for a given project module, it is beneficial to be able to run the test suite in a short time, say 10-15s. If you've got to wait minutes, then it is more integration.

The longer your unit tests take, the less likely people will be to use and run them often, which is the whole point. Let the nightly build on the CI machine exercise the long running tests when everyone is asleep.

True. Only a few tests are fast enough to run 12B tests within a few minutes.

Really, I think the problem is that unit test frameworks are currently incapable of doing the right testing.

Unit tests take quadratic time. That is, each new test requires running all previous tests, to get the green. And at some point, a project will have enough tests that it can't finish in 10-15s.

One option is to mark "fast" and "slow" tests. Another is to recategorize them as "unit" vs. "functional" tests.

These are poorly-defined labels. In this case, the 4.5 minutes of testing is "slow", yes, but it only needs to be run when a specific, small part of the code changes. The problem is, there's no way to determine that automatically. The test runner can't look at the previous test execution path and see that nothing has changed, and there's no way to mark that a test should only be run if code in functions X, Y, or Z of module ABC has changed.

Humans are able to figure this out. Well, sometimes. And with lots of mistakes. Get the unit test framework to talk with a coverage analysis tool, plus some static analysis and perhaps a few annotations, and this discussion of how to distinguish one set of tests from another disappears.

Blue-sky dreams. I know. :)

In real life we toss those functions into their own library, note that the code is static, and do the full test suite only occasionally; mostly when the compiler changes.

In other words, bypass CI the same way one does any other third party library. (How often do you run the gcc test suite?)

Various test runners do just this. Maven on TeamCity ranks the tests by their volatility (recently failed first), then by run duration. The point is to run the most likely to fail and historically most brittle tests first and the slow stuff last so you can fail fast.

The goal is similar, but it isn't the same thing.

That still means to run all the tests each time, with re-prioritization to enrich the likelihood of faster feedback.

But if none of the code paths used for a test have changed, and the compiler hasn't changed, and there's nothing which depends on random input or timing effects, then why run those tests at all?

The reason is we don't have a good way to do that dependency analysis, which is why we run all of the tests all of the time. Or we manually partition them into "slow" and "fast" tests.

Code instrumented for coverage tells you which tests executed which portions of code. As I remember, Google's C++ build/test system was using this by late 2009 to efficiently run all tests on all checkins to HEAD.

> Individual unit tests should be on the order of a millisecond or less so you can rip through them very quickly.

So now we have to write tests for all our code, and tests that run fast. We have to refactor our code around the tests. Something seems a bit back to front here. Or is the assumption that if we have 100% test coverage that runs fast then it means that we have written the best code possible?

I think I am siding with the author of the article on this one.

I find when I have to re-factor code around the tests it means the code wasn't very good in the first place.

The author complains about re-factoring code into smaller testable functions. I completely disagree. Code structured as small easily understood functions which do one thing and have obvious inputs and outputs is good code which is much easier to extend and modify.

Yeah, that's one of the article's weaker spots. But as a rule, I'd tend to interpret imprecise statements like that charitably. He's not saying that small, clear functions are bad, but that splitting functions for the purposes of testing is counterproductive. He's not saying anything about splitting for clarity and focus.

Note that the original article doesn't disavow all unit tests; nor does it disavow all testing.

Sounds to me like the kind of tests you're describing aren't necessarily unit tests (system-level race conditions aren't typically discoverable with a simple unit test); where the tests were truly unit-level, the proposed alternative (assertions) may have been even more informative. Finally - a few real unit tests for known correct behavior are advised where in essence the algorithm can be described independently from its outcome.

I agree that this usage of mocks is detrimental.

I think the only verification that should be done with mocks (the verify method) is that important collaborators have been called. Things that run off and mutate some global state, or perform an action.

Stubs are useful for controlling the execution state of the method under test. You could stub DateTime such that it was midnight in one test, and noon in another, if the behavior of the function relied on that. It does introduce a dependency that has to be changed along with the code, but is often helpful.

I do agree with the author that there's an art to testing appropriately. You could mechanically write testing code for every line of code under test (if statement with single condition? Two tests to verify each path is taken!), which leads to your app being coded twice: once in code and once (or more) in the tests.

The only thing mocks are good for IMHO is when the real code takes too long to run. Actually testing that a function is called twice is not what you're testing and this level of coupling to the supposedly encapsulated behavior of the method is ridiculous. The method is not called "callMethodInClassFooTwice()" it's called calculateInterestRate() or whatever and as long as it does that right that's all that's important.

Calls to external services and hardware, or really, any external system. It's probably quite important that your code interacts properly with the system, but you also probably don't want to interact with it every time you run the tests. Not unless you have a copy of the external system running for every developer.

> It does introduce a dependency that has to be changed along with the code, but is often helpful.

And that's the rub. Mocks are useful. They help you write tests that are more robust, without having to resort to massive and complex @Befores. At the same time, though, they have an effect that is more negative than positive on the ability to change the code under test.

Well I know it's a contrived example, but I don't understand the motivation behind mocking an external library's code. That library should have its own tests.

Say I have three layers of custom code: A calls B, which calls C.

If I want to test B, then I want to mock C, and have my test call B similar to how A does. I want to mock C because C is also custom, and if my test fails, I want to know if it's because of bad B implementation, and not because a buggy C might be confusing matters.

But if B also calls D from an external open source or vendor-supplied library, I don't usually want to mock D. That just adds needless complexity to the test, and reduces my focus on my own custom code.

An exception would be if this library code makes its own network call or something - then you might want to mock it to save time.

Anyway, mocked unit tests become far simpler if you maintain the right focus. Use test A to call B (passing in canned fixtures if necessary), while B mocks C only to maintain focus on B's implementation. If you start getting involved in trying to mock external library code, or even internal private methods that B calls, you'll have a bad time.

The advantage to maintaining that kind of focus is that refactoring becomes easier. Want to change the name of C? Your IDE should handle refactoring your test, too. Want to change the implementation of B? You don't even need to change your test at all, just make sure the right values are still there in the return value. Maybe you'd need to add a couple of assertions, but that's it. If you're looking at having to do a serious refactoring of your unit test in those cases, then it probably just means you're still designing your code architecture and things are still really fluid. And in that case, it would make sense that you might have to throw away your test, because by definition it means you are still deciding on what your specifications are.

The guys who came up with the mock object approach to TDD would say that you shouldn't be mocking external libraries directly. You want your own time abstraction which is probably far simpler than what you get from a library that has to satisfy everyone's needs.

I think that building that level of isolation between you and your framework or library is just basic good practice. The fact that you need time doesn't change, but the way that you get it might.

and I'll say that approach sounds nearly absurd.

I'd rather have a program that does what it does in X lines of code, than a unit tested, mocked, codebase in 5X lines of code. Sure you have tests, for whatever they're worth (I'm somewhat skeptical of TDD in the first place), but you have so much MORE code.

It's just basic separation of concerns. I've seen too many development organizations brought to their knees by the fact that they don't have any layering between their logic and the libraries/frameworks they use. It's a very real problem.

That's interesting. I don't have much experience with it, but when I've seen similar stuff it looked like an anti-pattern to me. Why should developers need to learn your specific wrapper on top of a popular 3rd party component? The internal thing is most likely not documented as well, and common problems don't have answers on Stack Overflow. It requires extra work to use additional features of the library.

I'm similarly wary of convenience libraries that provide marginally simpler APIs on top of standard libraries.

I'm not convinced it's a good idea. I wish I had your experience. Any good reading material?

Chapter 8 of Clean Code.

Anything on the Adapter pattern.

Wrapping a library in your own concept allows you to define what's right for your application. It has the effect of pushing the third party library out to the edges of your system, replaced with whatever you wrapped it with. This makes replacing it, for testing or any other reason, much easier than if it's proliferated throughout your code unwrapped.

Wrappers should be simple so creating them and documentation, beyond a few integration tests to understand how the library works, shouldn't be a huge concern.

I've been doing a lot of Javascript stuff lately and libraries like Dojo have 5 different ways to locate an element in the DOM. I have no idea if all these ways will be around in a future release, or, right now, which one is better. Unifying the Dom select code behind our own interface keeps things uniform throughout the app, instead of each of us using a different function, and lets us try out the different library functions, or different libraries, easily.

> Any good reading material?

"Working Effectively With Legacy Code", by your esteemed parent poster... :)

Let's say I'm using the libraries joda-time, log4j and junit. Currently I'm using them directly. How should I be doing it?

But where design comes into place is to determine where and when you need to separate these concerns. There are always times to do this. There are also times not to.

For example (Perl example here), in LedgerSMB we layer some things. We layer the templating engine. We layer (now, for 1.4) arbitrary precision floats. We layer datetime objects. Many of these are layered in such a way that they are transparent to most of the code.

But there are a lot of things that aren't layered because there isn't a clear case for so doing right now.

(As a footnote, PGObject requires that applications layer the underlying framework because there are certain decisions we don't feel comfortable making for the developer, such as database connection management.)

I agree. Wrapping your library for testing is really pushing the boundaries of sensibility.

Low density code that doesn't provide application logic is one of my pet peeves.

There is a philosophy that started in the 90s (and Microsoft was proponent of it [1]) that adding more layers to an application would make it more malleable, but in a typical CRUD web app, layers only bloat the code and make it slower.

I'd suspect adding a wrapper to a date class just for the sake of testing is more likely to add bugs than remove them.

[1] http://www.asp.net/web-forms/tutorials/data-access/introduct...

Not exactly the same case, but it sure is nice to be independent of calling System.getCurrentTimeMillis() in the code under test. One example of how to do it (in a simple case): "TDD, Unit Tests and the Passage of Time" http://henrikwarne.com/2013/12/08/tdd-unit-tests-and-the-pas...

That sounds more like an argument for putting an interface in front of an implementation on the app side, and then mocking that interface in the test. Which is totally fine, because then you are isolating the custom implementation (a light shell to the external library). As opposed to mocking the external implementation in the test.

That's more along the lines of what a lot of TDD literature says.

Write adapters or facades that wrap external libraries, use those in your own code's unit tests. This makes you less bound to a specific library as well. Don't mock the outside world [0] for testing your adapters/facades/whatever, but do integration tests that cover your adapters using the real outside world.

You'd also do complete end to end tests where the entire system is used as if it were in production (acceptance testing). TDD makes a lot more sense if you think of it in those three layers: acceptance, integration, unit.

[0] Outside world means anything that's not your code -- networking, filesystem, external libraries, etc.

<i> Should you decide that you don't want to use the DateTime library any longer you will have to not just change the code which is using it but the tests as well. </i>

This is not a bad thing. Ostensibly, you're making a change to code which implements a business function for a reason. If you're not using DateTime anymore, there should be a reason why.

Tests are supposed to be the proof that a business function is performed correctly.

If date/time is such a fundamental piece of logic that you have many tests that verify various conditions based on what day or time it is, and you change the fundamental WAY you represent date and time, then why is it bad that you have to make a lot of test changes as well as code changes?

Having those tests, and changing them over should be worth the effort to give you the confidence that your new implementation of how a fundamental of your code works the same way as before.

As far as the concept of checking "this method on this collaborator will be called X times," that can be painful to get correct the first time, but say you again change business logic and add a new condition. Is it not worth the effort to either think about "Oh, now I will have to call getHour one more time" and fix the test, or discover a broken test AND CONSIDER WHY you're being told "expected 2 calls but got 3" ?

Granted, if you're just changing times(2) to times(3) and going on, well, then don't bother. But that SHOULD be telling you something valuable. It's up to individual developers whether they choose to see that value or not.

Business logic doesn't require that you use DateTime. Business logic requires that the software understand dates in a certain way.

Some unit test styles enforce a given API at all levels in the code. Others restrict themselves to tests through certain restricted APIs.

For example, the SQLite code tests through the API, so it's possible to do major revisions of the code without changing the tests. The choice of how date logic works is invisible to the tests. Date tests can be done through SQL, without testing the actual code functionally equivalent to DateTime.

I prefer the latter. I liken it to building a bridge. The main test is "can you safely get traffic from X to Y", with other tests related to maintenance, beauty, environmental impact, and so on. This can start with a stone bridge, then a truss bridge, or a suspension bridge.

It's also possible to have tests like "there must be a pillar here, a truss there, an arch up there, and a screw over there", which requires that the bridge must be an iron bridge using a Pratt truss design, with three support columns made of granite.

Rebuilding/refactoring to even, say, a Parker truss would require rewriting some of the tests. Rebuilding/refactoring to, say, a truss arch requires a lot of rewritten tests.

But under the high-level tests, like for SQLite, major redesigns don't require major changes in the tests.

If at the start you know that you want a specific bridge, and you just have to work on it, then go ahead and write these very specific tests. Just remember that major refactors will require a lot of rewritten tests.

This makes a lot of sense to me. SQLite has the reputation of high reliability and thorough testing, maybe there is a lesson there.

In my opinion the value in testing is verifying what an interface should do, not how it does it.

SQLite gets it in part from very rigorous coverage testing. See https://www.sqlite.org/testing.html if you haven't already.

A hundred times yes. Actual tests that test an actual purpose are great. But I've seen tests that are something along the lines of

    var thing = new Thing();
There are some things you don't need to test.

There are cases where I do tests like that. There are two reasons for them.

1. It can save time debugging failures because I know whether the basic instantiation of thing was sane or not.

2. There are cases where there are complex object instantiation tests where an object is returned according to some other logic. Obviously these need to be tested.

In the first case those aren't tests for the sake of testing the API. They are sanity checks for saving time troubleshooting other test case failures.

on the other hand

my $thing = $thingloader->new($some_complex_data); ok ($thing->isa('Specific::Kind::Of::Thing')

is fine and useful.

Doubtful. That should be using dependency injection.

At the very most you should be expecting an interface, and not having the correct one should yield a runtime error within the code to begin with.

It would be a useful test for a dependency injection framework ;-)

That's what it is. You have a unified interface for generating things, and you need to have deterministic test to make sure your thing is being generated correctly and the data generates the thing fully and correctly.

As with all programming practices, theory and application do not always align, especially when the application achieves cargo cult status.

The details of how MyClass calculates due dates should be irrelevant. The test in your example violates this. It makes privileged assumptions to assert implementation details.

This is why it's better to code and test against interfaces. This makes it harder to make any assumptions about the implementation because it puts the abstraction front and center. A test suite written against an interface can be applied to multiple implementations. This of course requires some dependency orchestration to load different implementations into the suite, as well as to create/avail any needed mocks for that implementation (which ideally are also declared as interfaces). But this limits ripples from implementation changes to the dependency orchestration code.

With that in mind, if an implementation's internals need testing, then tests can be written against the class itself (rather than any interface). If the class has dependencies, then it should be obvious (or documented) how/when those dependencies will be accessed. When necessitated by the class's outward responsibilities, this makes it possible to ensure that a dependency is correctly accessed in certain conditions. E.g. Perhaps it must be ensured that an order processor always asks for validation from a credit card gateway. Your example does not meet this standard.

Basic theory aside, I personally agree that high coverage is not worthwhile. I prefer tests that ensure bug fixes don't regress, or that are high-level and mimic end-consumer behavior.

Just want to agree with your sentiment here. I have tried on many occasions to incorporate mocking frameworks into my unit tests, and in almost all cases the mocks cause more problems than they solve. Either I'm spending time debugging the buggy behavior of the mock objects themselves, or my test writing is slowed down with a bunch of tedious minutiae about the expectations of the mock framework. For whatever reason I have also found the mock-based tests to be very difficult to refactor when required.

I generally prefer to write a mix of unit and integration tests, and I typically find the integration tests to be more useful.

I don't see the point of asserting that getHour has been called in your (admittedly contrived) example. Mocking (or rather stubbing) a datetime library is usually just useful to get a predictable result. Making assertions on what methods get called and how many times seems useless in that case.

Sometimes it can be useful to make this kind of assertions on a mock though. Say you're mocking a web service, you can use this kind of assertion to ensure you're calling the web service properly. In this case it's useful because it's verifying behavior at a boundary between your program and the outside world. That's what I think tests are for. The outside world is not made only of users. Communication with external resources is also important and can be verified with mocks. In other words, this kind of mock assertions is interesting to verify the indirect outputs of the program: http://xunitpatterns.com/indirect%20output.html

I agree. There is no point in adding coupling for it's own sake by asserting that trivial methods are called.

Verification testing does however have it's uses in terms of verifying that key interactions have happened in the way you expect.

In the most part it's used correctly, assertions are much more widely used than verification testing.

Regarding Mockito specifically, I think there are worse and better ways to use it. I think it's great for mocking out external services, like web services calls or database queries. That way, you can easily test a bit of functionality without having to instantiate the universe.

On the other hand, I've also seen it used overly invasively -- that is, the test writer uses it to verify the status of some object internal to the class under test, like your example above. As you note, it creates an incredibly tight coupling between the test and the method being tested.

In general, I think that this sort of thing is correlated with classes that are highly stateful. If, for example, your class is full of void methods that modify class members, then you may have no other option than inspecting the internals of the class in order to test its functionality. OTOH, if you can manage to write things in a more functional, less state-heavy manner, then you need less of this sort of introspection.

That's hardcore unit testing.

If you don't mock out classes around the class your testing, its a integration test.

Yes, mocking too much is a problem. Perhaps the greatest benefit of unit tests is that they force you to write code that can be tested in isolation. In other words, your code becomes modular. When you mock the whole world around your objects, there is nothing that forces you to separate the parts. You end up with code where you can’t create anything in isolation – it is all tangled together. From a recent tweet by Bill Wake: ”It’s ironic – the more powerful the mocking framework, the less pressure you feel to improve your design.” (from http://henrikwarne.com/2014/02/19/5-unit-testing-mistakes/)

> From a recent tweet by Bill Wake: ”It’s ironic – the more powerful the mocking framework, the less pressure you feel to improve your design.”

This is exactly my sentiment. I would go even further, though, and say that "the more powerful the mocking framework, the more it discourages design improvements." If you have 50 or 60 references in your test to a mocked class of type A, and A turns out to no longer be needed, then the effort to get rid of it is higher than it otherwise would be.

I think the fundamental problem is that unit testing is hard to do right. It needs to be a part of software design, not coding and you should never write your tests to your code (but everybody does this anyway).

Note, I am very much for unit tests. However, I recognize that the basic test cases really are software engineering rather than development territory. So I think this whole argument points at a much deeper problem, and one I don't think test driven development is the answer for.

let's draw a parallel to the Python midnight time as boolean discussion for a moment. The Python midnight time issue was entirely because the developer of the library made his code the contract. "This is how it works so hence live with what that means" is not really a contract and in that case it had some really nasty corner cases (timezones west of GMT never evaluate to false for any time with a timezone, but timezones UTC and east evaluate to false when the time represents UTC midnight). But a lot of people didn't want to break that contract. Really? That's a contract that could only be appreciated by tax lawyers.

The same fundamental problem happens with unit tests. "This code works so it must be correct" is a typical response and so the code becomes the contract. A bug is found, and fixed, and the test cases start failing. This is a waste of time. It is a waste of time to write the test. It is a waste of time to remove it because it is irrelevant.

My recommendation is to do extensive unit testing. However, take the unit tests seriously. Before you write a test, ask yourself "If a bug is found and fixed in the software or any of its dependencies, should this test ever fail?" If the answer is "yes" then don't write the test.

The point of unit tests are to ensure that software contracts are not mistakenly broken. This is true no matter how you write code. My test scripts are always longer code-wise than my production code. This is true for SQL and PL/PGSQL, just as it is for Perl. However, I try hard to ensure that a test failure means that a previously working API is no longer working rather than that I changed something.....

That just sounds like a terrible functional spec. If it's a core part of the behavior that some mocked object is called 11 times then so be it, and so mock it. You're not introducing brittleness via your tests—you're introducing it because your functional spec is highly complex.

Is it important that you're testing the implementation interactions with DateTime? Why not ensure for the supplied DateTime that the expected result is returned?

I run into the problems with date-bound data all the time. It took me awhile of doing it the hard way. Now, if I'm working with dates, I usually always have a 'DateCalculator' factory that injects dates into my domain objects. Now, it's easy to handle dates in my tests: just implement a DateCalculator as a mock.

But, as for TDD, I think people can go way, way overboard. There are serious diminishing returns to doing TDD 'the right way'. I'm quite satisfied using unit tests to test particularly complex business logic / domain logic and as integration tests. I find that all too often devs will get so enamored with their 100s of tests that basically test nothing (as a ratio of functionality to lines of code), that they'll forget the main goal: shipping relatively high-quality software.

Disclaimer: I started as a hack, worked for many years to become a non-hack, then devolved to a hack with caveats. Now, for me, it's all about achieving the right balance between speed and perfection. YMMV.

Edit: shipping [relatively high-quality] software

You've fallen into a common trap. A trap that many developers and managers do - that software is the goal.

It's not.

Software is a means to an end. A tool we built for users and customers to solve their problems.

Software that doesn't solve the customer's problems might as well be gravel on a beach.

Software that is shipped, but isn't maintained, will diverge from customer's needs as they change. And will become just as useless.

So are you trying to precisely understand a customer's key requirements? Then you best represent them in unit tests.

Are you building and releasing iteratively and getting customer feedback to make sure that customers are getting what they actually need? (A lot of times what people say they need changes once they actually get what they ask for) Then you need to be able to refactor frequently and fearlessly, and you best have tests.

Are you considering future releases after shipping major versions? Same, you will need to refactor, and you will break existing functionality, and you need tests.

I understand you want to be pragmatic. But don't think that the people that preach TDD are just sitting there in love with their best practices for their own sake. We have just been burned too many times and believe the best process and framework for delivering customer value via software is to build on a solid foundation.


On second thought, I think it's an interesting topic of conversation.

When I'm working, my primary goal is to make money. And, when I'm working for a client or boss or customer, that means being responsive and showing them value for _their_ money. Oftentimes, this means trying to protect them from themselves, but, again, I want to get paid. So, eventually, despite all my arguments, it turns back to me shipping software. They want to 'see' something for my expensive invoice, subscription fee, or salary. The payer doesn't care about all that 'stuff' that I do that they can't 'see'.

So, yes, you're right. I do sacrifice quality at times to ensure that I get paid. It's not intentional. No way. But, I am a biased participant in this debacle: I have a perverse incentive to ship non-perfect software. I don't intentionally try to do so, obviously. But, there is an equilibrium point where I stop arguing and see that I can fix it in v2.

Now, on my personal projects, I actually take my time. I do things the right way, because I have time on my side. I'm my own boss. And, in those cases, I actually do write full sets of unit tests.

Interesting for me to observe this in myself. Thanks.

But, yes, when I'm working for someone, I have considerable pressure to break the rules. You're right - it is a trap.

Fair play. I certainly know that pressure too.

"the main goal: shipping software." There is a point, where people seem to forget that unit tests are a tool that have a place, and should be used appropriately. There is a point where it becomes more like a bureaucracy, or buying into a religion, where its not about using the tests to solve problems, but about checking boxes or doing it the "right" way.

I like tests. When a section of code is complex, and can't reasonably be refactored, a test is a good way to help build confidence that you've driven the wonkiness out of it, and don't reintroduce it with further changes. They help you catch dumb mistakes, where you change how you're passing a reference, or the type of child object a function sees, and suddenly, you're getting subtly wrong output. I do a lot of Computational Fluid Mechanics (CFD), and we've got a standard suite we run when we compile to make sure new changes haven't broken our base functionality.

Still, in most cases, I'd much rather look for a refactoring or functionality combination option. I'd rather each piece be small enough that I can hold a mental model of it in my head, and not be too scared I'm missing something. However, also still a bit of a hack.

I'm personally of the opinion that verify is harmful, because it does tie your test to your implementation. Unless that's absolutely needed (it probably isn't), it's a bad idea.

I use mocks more-or-less as advanced stubs, because when something has 600 methods in it (legacy code base fun) and I need one of them I don't particularly feel like generating that entire stub.

This is not a good example, because in this example, Mockito is used to mock a very simple class (DateTime), which should not be mocked at all. It is also used to verify that the dependent object is called exactly twice, which may be unnecessary.

This is more of a problem where a library is not used well. Mockito is a very good library, this example really does it disservice.

Mockito easily leads to a false belief that code is tested. Mocking dependencies and simply asserting those mocks is useless yet I see that all the time.

Mocks are useful for providing state where it's queried from external objects. They are also great to verify that the intended external side effects of an action occur. verify isn't / unit tests aren't the right mechanism for forcing a particular implementation (which is what your example test does - why do you care if getHour is called, it doesn't update state in another part of the system?) Your tests should almost never never say "this method on this mock will be called X number of times with this result", unless it's something that you actually care about (in which case it should be the entire subject of the test).

The above test is much easier without mocks. You construct a date, pass it in, then check that the result is equal to the input plus 10 days. That's all you care about, so that's all you test.

Verify is useful if you have to update external state. If, in your contrived example, for whatever reason, you needed to keep a tally of all of the times that due dates were ever calculated and that was stored somewhere else, you might have a test that mocks your CalculationCounterClass and verifies that a method (addCalculatedDueDate or whatever) gets called on there...

Good, simple (and IMO brittle) tests that confirm what you care about are useful. Forcing a particular implementation through your tests is a drain.

The problem with this is that the tests become obstacles in the way of refactoring the code.

I'm not a TDD stickler or anything, but tests are what make refactors even possible. Will tests catch all the issues? No, but they will make sure the part you refactored still interacts properly through its public interface.

If some tests have to be changed that is even better because those tests hopefully contain corner cases that have been caught and tested for over the life of the original code, and will force the person doing the refactor to think about those issues.

I'm not a TDD stickler or anything, but tests are what make refactors even possible.

Perhaps in general tests provide a useful safety net for refactoring, but I think the benefits are often overstated. In any case those tests don't have to be unit tests.

If I'm working on the kind of system that would need mocks for comprehensive unit testing, I'd rather either use integration or high level functional tests if that kind of strategy is viable. At least then I'm still testing that my actual production code works.

If that's not possible, for example because I'm integrating with an external system where I can't test operations that would cause real side effects, I'd rather look for a different strategy entirely. For example, I might change the design to better isolate the bits of code I can control and test myself, I might (re)implement riskier areas in languages with powerful systems that will guarantee various classes of programmer error can't happen, or I might not have an automated test suite covering some code at all and rely on things like code review or automated code analysis tools instead.

If you need to modify the tests together in a parallel fashion to the modifications you make in the code you are more likely to introduce same conceptual bugs in both places.

I have found one situation where these types of tests are useful to me. I'm working on a system full of legacy code and nearly devoid of tests. Most of the time a simple refactoring will make whatever feature or defect I'm working on much easier, but I'm loath to refactor without tests in place. The current design of the code doesn't allow for easy testing, so I'll write some tests using mocks to verify behavior. Then I can refactor with a little more peace of mind.

"Because companies rarely dedicate resources to making existing software better purely for its own sake, ..."

Well, there's your problem right there. Do not implement a crap workaround when you know what the proper solution is.

I actually liked mockito, when I was doing Java development a couple of years ago.

That just looks like a horrible test, I tend to argue that when you have to resort to using a verify on any mock then you are looking at something problematic.

The same argument could be made if you replaced the call to verify(...) with a when(...). The point is that mocking frameworks tightly couple the test code to the current implementation. That has downsides.

You are confusing testing with unit testing ( an unfortunate name) . Unit testing is a design tool not a testing tool. It has more in common with UML than with selenium.

You can in theory throw away your unit tests after you've implemented a piece of code.

It should be used for exposing weak design ( tight coupling for example) not for exposing nullpointer exceptions.

On the contrary. Unit testing is exactly for exposing that sort of thing. (Actually, I see more problems with cute off-by-one bugs than null pointers per se, but still.) They should not be thrown away.

There are ways to do it and ways not to do it, though. A good framework should be exhaustively unit tested. Individual components that use that framework should be lightly unit-tested (rather than having the framework heavily mocked) and more attention should be spent on integration/acceptance tests.

That sounds daft. One of the main benefits I can see with unit testing is that if you re-factor components below, then you can be reasonably sure it will work the same way afterwards. Why would you throw that away?

You have got it all very wrong. Not a single thing you've said is correct.

He may be confusing TDD with unit test, but he's right that unit tests tend to expose design rather than bugs. Unit tests are typically tightly coupled with the underlying implementation, and so unforeseen cases in the code (that will cause a bug) tend to be unforeseen cases in the test too.

Finding bugs is not the primary purpose of unit tests.

No, you are confusing unit testing with TDD. Unit testing is just testing units of code. TDD is the religious practice adhered to by some followers of Agile that is a "design tool" where you write tests first. http://en.wikipedia.org/wiki/Test-driven_development

>not for exposing nullpointer exceptions

If you are using a language primitive enough to have null pointer exceptions, then using unit tests to try to detect some of them is a perfectly reasonable thing.


It varies from project to project, but my flow nowadays is something like this:

1. Describe the behavior of the function in a readme with a code example.

2. Create a test that is mostly just a copy/paste from the readme code.

3. Write the actual function until it passes the first test.

4. Write a few more tests for any edge cases I can think of.

5. Get the function to pass again.

6. Encounter another edge case, goto 4.

It seems to work pretty well for me anyway. I think a lot of the debate really centers around unit tests vs integration tests (no one serious I know of is arguing for no testing, or all manual testing).

In my experience, you get the most bang for your buck with mostly unit tests in library code, and mostly integration tests in applications. Despite the difference in use cases, I think both sides talk past each other most of the time.

Unit tests all over the place in an application code tends to be a waste, because the tendency is to test the libraries your application is using. Integration tests in library code is often terrible, because they are usually slower and tightly coupled to external libs.

I've been considering for some time to write the documentation to an app first in a README restricted to Cucumber grammar and language, and see how everything develops from there.

We are looking at updating our version of Django in the next year, and I am quite happy that we have a battery of unit tests that test how the features of the current version framework work against our code. I will be a lot more confident about upgrading versions because of what might appear at first glance as 'wasteful unittests'.

Also, as a rebuttal to the author, how can you expect crappy developers who write crappy tests to write good code?

I have had to do some major refactorings in projects I've worked on. They would have taken a lot longer to get right without the moderate-to-decent coverage of the test suites. Unit tests are a tool, like anything else in software development, but they're invaluable if used intelligently.

Integration tests would be more useful, no?

The more I become experienced with testing, the less I think the difference between functional, integration and unit tests is important. What's important is to have a test suite with good coverage that runs fast. I found that a mix of high level and lower level tests is a good way to achieve this goal and I don't mind mixing them in the same files. High level tests provide coverage more quickly, but some stuff, such as verifying that an exception is handled properly, is easier to test with unit tests.

He probably meant integration tests. Generally the standard practice in Django is to do integration testing at various levels. 1) Test the model with an in-memory database, or real database with rollbacks at the end of every test. 2) Test the views, generally using the actual models and db. I don't ever use mocks much when testing a Django project.

It wouldn't be too much trouble to mock the model when testing the views, but I don't bother and I don't imagine anyone else does either.

Pragmatic unit test coverage is the way to go. Most of the summary advice isn't too bad but this statement is beyond absurd:

> • Throw away tests that haven’t failed in a year.

Apparently the author has never worked on a large software project with many versions supported over 5+ years. The value of unit tests increases over time as the software and libraries it depends upon change.

Heck, I work with software that is 10 years old, and the only thing that keeps us sane is very meticulous testing.

Seems like the author never heard of testing the tests: mutation testing.

Some of my my comment here may be a difference in how we view unit tests vs functional or integration tests, and it may also have a lot to do with my use of ruby rather than a language that needs to be compiled. But I hate having less than 100% test coverage, I find it unsettling.

One thing I've often said to sell people on the concept of testing: you're already testing. Guaranteed. If you're writing a complicated method, you're almost certainly calling it on the command line and checking output, right? Well, those things that you're writing to check it? save them in a folder called unit tests. You've probably written some code to ensure that different bits of code work properly together? Save those in a folder called integration tests. You may be opening a browser to make sure that the app is behaving the way you expect. Save those tests in a folder called integration. All of this would make testing natural, rather than something you impose onto your project that feels contrived and vaguely forced.

I will admit that there is some overhead on all of this. Rather than just typing into a browser and checking the results, you'll need to figure out how to automate that process. Rails makes this quite easy in my opinion, and rails does blur the lines between a few of those categories I mentioned above. I wouldn't worry too much about that - if you're writing tests that cover these scenarios, I wouldn't bother getting too deep into the semantics of testing. You're good.

I run a code coverage tool because I like to see what I haven't tested. Often, it turns out I left some code to rot and forgot about it. My gut feeling is that code rot is probably the most confusing thing to a new developer, and one of the hardest things to deal with (can I remove this? should I?) Leaving around obsolete code is kind of a burn on anyone who comes after you.

I have mixed feelings about TDD. I will say that when things are going swimmingly for me, I am often doing TDD. But it requires a clarity that I sometimes just don't have. More often, I'm writing tests and code essentially in parallel with each other (like, write a line of code, write a test, write a line of code, write a test). I suppose I could reverse them.

One last thing - I think that tests, especially the integration tests, should give you a very good sense of what an application does. In short, if you wanted to give someone a demo of your system, a walk through every case in your integration tests would probably be a very good way to illustrate what it does.

I've come to consider the main advantage of TDD is that you're more sure your tests are correct when you write a failing test, change a little bit of code, and then seen the test past. You then know the test is related to the changes you made, and that the test does it's purpose of preventing accidental regression of the new feature.

I have written unit tests around existing functionality, and the unit tests pasts the as soon as I'm done writing it, but then I go mess up the code, and the test still passes -- turns out my tests wasn't testing anything useful. To me, this is the problem TDD avoid.

You can also avoid the mentioned problem by writing production code first, and then writing the test, but then double checking that the test actually does fail when it's suppose to. I consider it important to see whether each test passes and fails in the right conditions, and this request that I see each test past and fail at least once. TDD does this, but so do other methods.

I also remember that the version control doesn't know what order I wrote the code in (unless I break up the commits). My team might be crazy into TDD, but as long as I check in code with good tests, they don't know whether I wrote the tests first or not.

Same here. In short, testing the tests by ensuring they break when they should is the most important part.

If I work on a functionality, my first step is to break it and make sure some tests did break too, only then can I start working on it.

You're not doing Test-Driven Development if you're writing code and then writing tests, though. You must write your test first, and only after you see it failing, write the code to make it pass. You have to stick to this cadence if you want to put TDD in practice.

Agreed... though I guess I gotta ask, what did I write that made you believe that I think writing code before tests is TDD?

I think your comment just made me understand what was nagging me about testing: writing tests is a form of decomposition. The method/function is complex/complicated, and therefore hard to reason about. The unit tests limit the scope of input so that the in particular instances, the method/function becomes more simple to reason about, enough that a test can be written.

Sadly, I feel there are things for which tests can't be written. If a function takes two integers, and adds them together (the typical 2+2=4 therefore n+n=n). Does this work when the integers are near-infinity large (trillions of zeros -- theoretically possible with python)? How would a unit test validate this?

If you wanted to test all combinations possible, you would have to brute-force until the end of the universe, or until the machine runs out of hard drive space (or SAN space), whichever comes first. If you wanted to take a statistically significant sample, you would only have an elevated level of certainty, not an absolute level of certainty.

I think that what the author is pointing at is that like mathematics, which, let's face it, the human brain is much better at than computers, programming is best done in the human brain, and that once the human brain has satisfied itself of the proper of the program, the coding becomes simple.

And the program no longer needs to be decomposed, because it is understood as a whole.

Random property-based testing is useful for that sort of thing, as used in the QuickCheck library (most popular in Haskell/Erlang but there are ports to pretty much every language in existence). One property you could use for your example would be to have a simpler, slower reference implementation that you trust and test that it produces the same output as the one you've written. This isn't a verification but is very effective at snuffing out bugs.

But you are right that testing does not verify the correctness of code (unless you are writing something like a boolean -> boolean function), but often property-based testing is a stepping stone to verification since there is a quicker feedback cycle between write code->find bugs->fix code->repeat than the equivalent write code->fail to prove it works->wonder if its broken or if you just don't know how to prove it.

Suppose you write a method that takes two inputs and returns the sum. What would you do next? You'd probably do something like you just suggested, verify that when you add 2 and 2 here, you get 4. So, save it. This isn't a complete or perfect test. In fact, this particular test is risky, because later, when someone decides that addify should multiply, your tests will still pass. I'm not going to pretend that this solves your problem. But if you had written 5, 2, and 3, it would have protected you from that later scenario. And IMO, it's not the dev's fault who turned addify into a multiplication function - how is he or she supposed to know it broke the code if there aren't any tests?

To me, though, the bigger problem is when you're trying to figure out what your code is supposed to do by writing code, and then shoehorn that process into TDD. Like, write down 42, then make it fail, and then make it pass.

If you wanted to test all combinations possible, you would have to brute-force until the end of the universe

You only need to test for a combination of all the input variables that affect the execution flow. However, to do that properly, you need to know the flow of the methods your methods calls. (E.g. you would need to know that substring throws an exception if the string in shorter than expected.) This kind of information could be captured in some kind of meta-data that could be propagated up the call chain for testing.

But then you still wouldn't know that the function could be used for any two integers, including those in the trillions of zeros.

Also, you would have to anticipate what possible uses it could be used in, and that's impossible to know with certainty, so how could your tests be accurate?

Well, you wouldn't be able to conclude that your method works for any possible input. You'd have a verification for how it is supposed to work for a predetermined set of inputs.

Think about the situation above, where you're dealing with a method to add two integers. So, you test 2, 3, and 5. It passes. Later, you decide that this method should multiply rather than add, so you change it. Your unit test breaks.

In my personal experience, about 95% of the time, I want to keep the modification, and so I need to update the test. But every now and then, I realize that the test is accurate and I have introduced unintended side effects into my code.

That's just me developing for me. The tests are also very important when a new developer is working on the app. If they change something, they need to know if they've broken anything downstream.

It's not failsafe but I do think it's a huge improvement over no tests.

If they've broken anything downstream... Well that would mean that something downstream is using it.

This, to me, would mean that the app is being used as an API. API rules need to be applied (don't change existing versions, etc). That's something OOP is very bad at, I think.

Using the Unix philosophy or not adding to existing programs would also alleviate the need for regression testing, no?

"More often, I'm writing tests and code essentially in parallel with each other (like, write a line of code, write a test, write a line of code, write a test). I suppose I could reverse them."

In order to do TDD, you have to do it the other way around, no backsies. The practice is very dogmatic about this. It's kinda tricky but all you have to do is write what you wish you had.

Huh. I intended that bit as a specific example of a non TDD approach, where you don't write tests first. So I'm not surprised that you identified it as a non-TDD approach, I'm just a little confused about why you thought I was presenting it as an example of a TDD approach.

Guess that wasn't clear. Oh well.

Isn't this still TDD though?

My understanding was that writing tests first is known as Test-First development which is a subset of TDD. TDD allows you some flexibility to write code first as long as the tests 'drive' the development process?

No, you shouldn't write code first at all. As soon as you do, you break TDD. You have to write your test first. The practice is pretty dogmatic about this.

The trick to writing a test first is that you have to write what you wish you had. It's kinda like Composed Method in that way.

Jesus... All these rules..

I'll stick with the smalltalk adage: code a little. Test a little.

No need for dogma.

I like it ;) Might borrow that one

Source code is a poor way to specify the behavior of a system. It is complex, verbose, and difficult to reason about without proper language abstractions at appropriate levels. Worse still are machine optimizations obscuring intent as the software evolves, work-around "patterns" for inferior PL designs, and other savageries inflicted upon us. It is highly unlikely you could give a brilliant programmer a few hundred thousand or millions of lines of code and ask them what the program does in an afternoon let alone ask them to find the bug in it.

Good unit tests provide the specification for a certain low-level of the code to a developer. You will never catch everything but their purpose is not to test for absolute correctness. Their purpose is to specify, for a given range of expected behavior, what the boundaries are. If I write the test first in earnest I am telling my future maintainers what I intended to write. We can execute that intention against what I wrote and make assertions about whether it works as expected. That's all I expect unit tests to do.

Formal correctness proofs for all possible inputs is way out of my league and beyond the scope of almost every project I've encountered save for automated proof verification software.

You should write unit tests and you should write them first. They are the executable specification you will run your program against to ensure your software behaves in the manner in which you intended it to. With solid, well-thought out tests guiding your design you should be able to optimize and refactor your software over time and see that it still behaves as expected in the face of change. And you should be able to hand a suite of well-written tests to a new developer and expect them to understand what a given piece of software does in a day (perhaps in even less time for smaller pieces).

I agree with everything you've said except the line about formal correctness proofs. I'd in fact be willing to replace "unit tests" with correctness proofs at almost every point in your comment and stand behind it myself... given that I get to define "correctness proof" carefully.

In particular, I want to appeal for using "the most formal thing reasonable" whenever possible. If you're writing highly abstract code, you would likely be amazed at how simply you can write down and prove it's behavior. For less abstract code, some formal properties may still be available and provable, some available but difficult to prove and better verified using exhaustivity or probabilistic property checking [0], and finally some are simply unknown.

My point is that unit tests are one of the very weakest forms of verification. They should be traded for more powerful techniques at every availability.

Availability is, today, still an issue, but this is precisely why dependently typed language research is interesting. In Idris you can write your core code alongside a machine-checked proof of its correctness. You can verify your outside interface by nearly autogenerating an interface fuzzer and checking encode/decode roundtrips or system fixed points. Finally, you can compile the whole thing down to Javascript and deploy it atop Meteor if that's your endgame.

Idris is just one interesting point on this spectrum. I think availability of formal correctness proving (in capability, expression, organization, and automatic execution) will be a major force in the future of CS.

[0] http://en.wikipedia.org/wiki/QuickCheck

Maybe it's because lisp has warped my brain but I'm not entirely sold on dependent type systems and formal proofs yet.

It will be interesting to see if there's anything worthwhile that falls out of it as a consequence of such research but I don't think it's very interesting in its current state. It could use more exploration and I look forward to seeing what comes of it.

The way I see formal correctness proofs is that they'll be useful, but not a panacea.

If I have a method that takes a List as a parameter, and returns an Integer, then congratulations - I have just written a formal proof that it is possible to derive an Integer from a List.

However, the business need wasn't to derive an Integer from a List - it was something far more specific.

So that's why people like the idea of dependent types, in that you can make the type system (mostly) Turing complete, and actually make the types reflect the specification of the problem.

And then if you write an implementation of that specification, congratulations, you have proved that that implementation is correct.

Which is pretty cool, except that it overlooks the fact that a very large percentage of our bugs are not in our implementation, but in our agreed-upon specification. Almost every bug I encounter (that sneaks its way to QA or production and is reported as a bug) is due to the fact that I validly implemented a specification one way, when the business need actually wanted a slightly different interpretation of the specification, but just wasn't specific enough when they described it.

Anyway, formal correctness proofs won't handle that. Really all it means is that the hard part of programming - making sure we're actually developing the correct thing - will be offloaded from implementation to specification. That means that implementation will be more of a code monkey role, while the real art of programming - interpreting and anticipating needs, a role that programmers are vastly under appreciated for - will need to be accomplished by someone that is sort of like a product owner, but far more technical and exacting than is the norm now. I'd imagine many senior level programmers would move into that role - translating business needs into formal specification, and then leaving junior programmers to monkey with the proof itself.

I don't see formal proofs replacing acceptance tests. I see them helping you build leak-proof abstractions upon which you assemble the domain where your functionality is implemented. The line between those two positions is blurry and, despite the sound of it, your need to be comprehensive with a formal proof also varies due to possibility and value.

All of these things are just tools after all—you use them to produce value. I think you've described one potential future with this technology, but it, I feel, is very narrow.

One thing that does seem to be playing out is that for many interesting programs the establishment of the theorem is sufficiently challenging that the proof need not be relegated to "junior programmers" but instead automated entirely. Of course, that said, typically a primary tool in the search for a well-specific theorem is a long series of failed attempts to prove its poorly-specified predecessors.

Heh, I find it the next step in evolution after lisp. All the same flexibility, but with a new language that lets you delineate and work with the meaning of your data alongside the data itself.

I don't think it's anything resembling practical in its current state, but I think it's absolutely the way of the (possibly deep) future. I want to spend more of my time programming with types than programming with values since that's where my brain honestly is (despite the type languages of today being a bit obtuse) and since the compiler is often able to program the value-level stuff for you.

Spot on.

Unit Tests aren't there to ensure that you get the right answer. It's there to ensure that you've asked the right question.

Or at the very least to document the thought process of the person implementing code.

Use static analysis tools + code reviews to improve correctness, and unit tests to guide design + document.

Unit tests aren't for asking the right question. There code is there and readable (hopefully) so that shouldn't be an issue. Unit tests are there to ensure when you need to make a change you don't include a regression (moving code backwards). Missed something that the function was previously doing.

They aren't useless and have a clear place in testing. Unit tests alone can't guarantee a fully tested system, but they minimize engineer error when working on various pieces of a code base, but this is predicated on cleanly separated and testable code pieces.

Can't you both be right? Another facet: unit tests are the living, useful documentation of the code. Starting on a team with a large legacy code base, unit tests are the tutorial on the best practices of the API. This magnified if the "unit" tests include full functional test with an embeddable container (eg. OpenEJB).

(apologies for parroting the gp)

Code is never readable. The "self-documenting code" mantra can only be applied to edge cases and toy problems. This is simply because the human brain is really bad at reading code. The level of abstraction optimal for human consumption is so high that reading precise instructions is a pain.

I have been trying to add unit tests to my app, but almost all the tutorials you find are useless. They show you how to test 1 + 1, or how to test the framework I am using. The parts of the code I am almost certain will work. None of them show you how to actually test anything useful.

I wrote a book called 'Working Effectively with Legacy Code' that's really all about testing code that is not easy to test - real apps beyond toy examples.

There are techniques in the book but really you should try to write tests for code as you write it. It's the best way to make sure that your code is decoupled enough for testing. If you do that, you don't need the techniques in the book.

This book is legendary! This and Uncle Bob's books Clean Code. A must read for every programmer !

You're the author of that!? I love that book, and even though I stick to it's practices less than I probably should, it was incredibly helpful

I've heard many good things about this book, definitely on the reading list!

I read many tutorials and several books on TDD but never quite got it until I read Kent Beck's Test Driven Development By Example[1]. I also highly recommend his Screencast series on pragprog.com[2] which gives you a good flavour of the workflow he uses with TDD.

Edit: For those that don't know Kent Beck is basically the godfather of TDD.

[1] http://www.amazon.com/Test-Driven-Development-By-Example/dp/...

[2] http://pragprog.com/screencasts/v-kbtdd/test-driven-developm...

Personally I can relate to Kent Beck's approach to testing rather than "test everything":

"I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence (I suspect this level of confidence is high compared to industry standards, but that could just be hubris). If I don't typically make a kind of mistake (like setting the wrong variables in a constructor), I don't test for it. I do tend to make sense of test errors, so I'm extra careful when I have logic with complicated conditionals. When coding on a team, I modify my strategy to carefully test code that we, collectively, tend to get wrong."


Yeah, I think this comes with experience. I started off testing everything but over the years you get a feel for which tests aren't going to add value over the lifetime of the project.

I have also read Test Driven Development By Example. It really opened my eye's to how TDD is suppose to work. Its quite well written. Most of what I have seen on the web does not explain it to any sufficient level.

Testability falls out of good design. Testability is not worth optimizing for on it's own, though. It's just a nice side effect.

Read Growing Object-Oriented Software Guided By Tests. The authors walk through the development of an auction sniper.

The "testability" of a piece of software doesn't really have much to do with the discussion about the quantity or kinds of tests that should be written for it.

Every test is another thing that has the potential to require debugging.

So true, debugging "tests" is infernal. Peaple always tell you testing should be easy and quick,it is neither of them.

You might like http://blog.testdouble.com/posts/2014-01-25-the-failures-of-...

It's a tad long, but worth it. tl;dr: the first half agrees with you, and the second half describes a process that does help you test useful stuff (and design your software decently along with it)

I have been adding integration tests. Its an in house app, so constantly changing, and constantly in beta.

I wanted to start adding unit tests a couple of years back, but looking back, I don't think it would have gained very much, and cost a lot of time (creating the data to test the parts).

If there is a functional unit that does one specific thing, I could see it being useful to test. But most of my app basically queries a database and displays the results. The bugs I do get would rarely have been caught by unit tests.

Well, you have missed the point of TDD. It exists to sell books, seminars, training and consulting. Not to help in the efficient creation of useful software.

Look for TDD tutorials for less trivial examples of what you should test. You want a general 'how do I know what to test' tutorial, rather than 'how do I use testing framework X' (which is what you were probably reading, if they show you how to test 1+1).

Other than that, just be explicit about your api contracts and test those.

The idea of making the software worse to increase some arbitrary measure of "maturity" is sad and amusing. It makes me think of the saying "what gets measured gets done." It seems it would make more sense for management to be serious about things that actually matter, like customer satisfaction, user experience, productivity, and defects in production.

Unfortunately, the word "testing" somehow equates to the quality of the software which it does not, not only to managers but those who have been taught to read about it and forced to repeat back it's list of advantages. It's a false sense of security found in a process that has minimal impact on the software performance and quality.

When you write a crappy source code that performs crappy, writing all the unit tests in the world won't save you. You simply increase the technical debt to future changes in the source code or design of the software.

Simply keeping things as simple as possible, focusing on quality and incrementally verifying along each step of the way saves far more time.

It always puzzles me how quality can be increased by writing more code around existing code of questionable quality.

> 1.5 Low-Risk Tests Have Low (even potentially negative) Payoff 

That's the key point for me -- does this test add sufficient value to the test suite to justify it's creation and existence? If it's checking to see if a value got copied over from one object to another, then no. If it checks some logic, then probably yes. Is the code under test used in many many places? Then test the hell out of it.

I used to be a diehard test fanatic, but then I grew up and realized that, just like SQL ins't suitable for every task every double-plus-good, neither is TDD. Some things I don't unit test:

That clicking a particular button fires a particular event. Syntax-wise, setting up the event handler was essentially an assignment, and testing the physical clicking of the button was the job of whomever wrote the operating system's GUI system.

Similarly, that setting a field actually persists the value. Assignments are easy to get right. They get wrong when I'm naming my variables poorly. Unless that field is actually a property that is really syntactic-sugar for a method call, in which case it is necessary, so test away.

That values adhere to certain type properties. If I have to write a test to check type information at runtime, then I'm not using the type system in my programming language (be it static or dynamic) correctly.

I do not use mocks to be able to test stateful subsystems. The whole mock system creates a maintenance nightmare and makes changing APIs extremely brittle. It's far better to have scripting in place to build a base, test system from scratch and walk the program through a set of steps, checking state transitions after each step. In other words, it all becomes on, big, stateless test.

The added benefit: such scripts tend to help figure out deployment issues before they happen.

I try to use stateless designs for everything. Method calls do not mutate in place, the create a new object and return the results. This is not only safer, it's infinitely easier to test, as you automatically have both halves of your state transition, rather than having to clone the initial state before calling the mutating function.

Sometimes people say to me "unit testing!". And I say "great, of course you are already using a strongly typed language that the compiler can statically check, that'll give you loads of tests for free!"

Then they stare at me blankly because they're Javascript or Ruby guys who are only playing at being serious software engineers.

probably very close to the way I stare at people who exclaim "Yay! It compiles! It must work!"

I predate the popularity of unit testing, so I've developed some pretty good skills understanding the problem domain and carefully coding for it. So yeah when I did get my code to finally compile in Modula-2, it usually worked pretty damn well.

Way too obvious a troll. I give you a 1/10.

"If you find your testers splitting up functions to support the testing process, you’re destroying your system architecture and code comprehension along with it. Test at a coarser level of granularity."

What? Splitting up large functions into smaller, well-named units destroys code comprehension?

If your code turns into 30 one-liners with 30! unforeseen possible calling combinations thereby exposing an API of irrelevant auxiliary functions to the outside world? Yes.

I think there's a bigger design problem then, e.g. those functions should really be classes. I'm just talking about doing a few 'extract method' refactorings.

I sometimes find that code which doesn't involve lot of nesting is a lot easier to read and comprehend as one long method / function / subroutine than chopping it up into multiple small methods and having to jump around trying to work out what is happening. There is often nothing to be gained by splitting it up in my opinion.

Is that not why some languages are known as scripting languages, because you can read them like a script?

I see what you mean. I'd suggest that longer, more descriptive method names can help here, though. If I'm referring to a method body to understand what it does when reading another block of code, the method probably doesn't have a descriptive enough name.


Unless the rewards of crapping-up the code is very high, I find this result highly unlikely.

Happens all the time with TDD. The worst case scenario is when the code is split up into tiny interfaces that don't represent anything, and the implementing classes have myriad of tiny methods, which also don't represent anything. App logic is effectively composed during run time, and impossible to analyze by looking at the code. Every single method is unit-tested, but their interactions are not (yay, mocking!).

I liked the part about turning unit tests into assertions. Too bad most of us (including me) are programming in Java, rather than something like Eiffel.

I really liked the "design by contract" stuff built into Eiffel when I read about it years ago (before there was such a thing as Java).


Contrast the notion of a class invariant with the notion of a Java Bean, which is constructed as a useless empty piece of crap, and is eventually mutated enough to represent something useful. Of course, this rant takes us into immutability in general...

Another client of mine also had too many unit tests. I pointed out to them that this would decrease their velocity, because every change to a function should require a coordinated change to the test.

This is one thing I have never seen replicated outside of Smalltalk projects: Automated refactorings of code automatically applied those refactorings to the tests.

The tests in my largest code work at a very high level. They are a number of example documents, chosen to exercise particular features, which are processed by the program and compared with known output. The tests have been pretty effective at spotting changes in behaviour and crashes, so I've found them useful. I'm not convinced that any lower level object or function verification would be particularly helpful, but of course programs vary in their structure. Treating the program as a black box with a variety of functions is a helpful way to test I believe.

Agreed. I've found system testing, using things like HtmlUnit, of things the user sees to be a much better use of time that testing individual method / function / subroutine calls.

For some types of applications, you might have to make your own testing tools, though: e.g. - compiler/tokenizer might need a tool to dump & examine the generated binary; batch job file-watcher & scheduler might need a tool to supply input files and manipulate apparent system time; ...

Some code are very hard to test unless you spend a great deal amount of time in making a test driver. I once wrote some 1000 unit tests all by myself for a project I was working on. It was painful to keep updating code, no matter how much time I put into abstraction. At some point a big feature blows things away and tests start to fail.

Some systems are hard to test. Anything involving spawning processes doing unit-testing is hell. I can mock/stub but it takes a lot of effort to do that. Nowadays I go with functional tests as much as possible. If I can't even pass my functional tests how on earth do I ship my code upstream. Sometimes unit-tests don't reflect problem in the real world and the hours i have to spend on making unit tests passing can be used to make a few more functional tests more robust.

Does anyone know unit test coverage for software like puppet/chef/ansible. A year ago when I checked ansible the number of unit tests was very little. Most of the time functional tests look more promising then unittesting in that kind of complex, process/system interactive software.

"Premature optimization" has become something of an anathema in the coding world, why isn't "premature testing" so scrutinized? Back in the 90's everything was over-optimized because, well, it made for more interesting work than coding to requirements and it showed off how smart you were. It seems like unit testing is now the "interesting" thing to show off your intelligence—"how can I abstract this function to allow me to inject a mock that asynchronously...?"

And the sublime thing is that unit test will always be there to show off the original writer's intelligence, just like the crazy cool optimization in the 90's. And while it works perfectly with the developer's idea of how things should work, there's a disparity between that and the real world, and as soon as that disparity is revealed, it's suddenly a lot more work to wade through all the cruft than it would have been if the requirement had just been implemented in the most straightforward fashion.

A few years from now "premature testing" will be a thing.

Makes a lot of sense really, and the parallels are there: "First, design without (optimization/testability) in the most straightforward fashion such that it works and the code can be reviewed in a straightforward manner. Only then, decide if some increased (optimization/testability) is worth the extra complexity and implement as necessary." Pragmatic not dogmatic.

Yes poorly written or tautological tests are useless . But I think we already knew that.

I primarily use unit tests as a way to clarify my own thinking about a function. It helps to step through the process of what the function should return given what parameters it is given. Why is that psychological crutch useless? Is your argument that we all should work the same way? Can't we do whatever works?

> Yes poorly written or tautological tests are useless

> It helps to step through the process of what the function should return given what parameters it is given.

Congratulations, you just described tautological unit testing. Unit testing should be about guiding design and describing the behavior of the subject under test. Focusing on input-output testing is exactly what leads to the "unit testing is worthless" blog posts.

That glosses over things too much.

Tautological would be like testing the getters and setters of a DTO. Unnecessary.

But testing the return value of a function that accepts parameters is not tautological if the function actually has a nontrivial implementation.

Not sure at all what you mean. Can you describe what "guiding design and describing the behavior of the subject" looks like from the unit test's perspective?

How else are we to specify the behavior of a function than to specify, for a given combination of typed parameters, what is the return value?

As the article points out it is impossible to exhaust the space of all possible combinations - but nobody is trying to do that, so it is a straw man. For me, I test the branches that are likely to occur in a production setting, and any obvious edge cases that might occur, but infrequently.

> Can you describe what "guiding design and describing the behavior of the subject" looks like from the unit test's perspective?

That's the first part of unit testing. You write a test that fails. By doing that, you are describing what the API should look like. The unit test guides the design of the interface, and how that object works. People do this, even if they aren't unit testing. You've done this if you've ever written some code calling a method, knowing that the method isn't there yet. But you stub it in, and then go write the method.

Right. And that's what I, and most other people who unit test, do. The unit test says nothing about how the object is implemented though - that's your job. As far as the unit test is concerned, the function is a black box. The unit test isn't concerned with the details of the implementation.

For testing side effects (like the function's effect on a database) you sometimes work outside the function's parameters and return values. But the point remains - you are looking at what behavior the function exhibits from outside it.

Just a nitpick: that's TDD, not unit testing in general. When unit testing guides the design, it's usually called TDD.

It's perfectly acceptable to write unit tests _after_ the production code, and to not have them guide the design.

You are right, of course. And that's what I meant. I just equate unit testing and TDD together so often that I tend to (incorrectly) equate the two.

When you're implementing an algorithm (one of the places I'm most fond of making sure I have tests), focusing on input/output testing is precisely what you want to do. Partition your inputs into different ways the code needs to deal with them and test for each partition.

tautological tests are not necessarily useless. there is a lot of value in testing for unexpected behavior. the tautology SHOULD pass trivially, if everything behaves the way you expect it to. If the tautogical test fails that means your expecations are not correct and you need to learn more about the technology you're working with.

Hard to argue with this logic:

1. Poorly designed unit tests are a waste.

2. Most code is poorly designed.

3. Therefore most unit tests are a waste.

To be fair, there's no shortage of high level advice on how to write good tests (e.g. "each change in the code should break just one test"), but it's hard to find examples of how to do so in practice (unless you're writing yet another calculator app)...

>Hard to argue with this logic: > ... Agree > but it's hard to find examples of how to do so in practice

yeah, I think that it is because every architecture/design/component/layer seem to need a different testing approach. You really have to look at the specific case to come up with a good strategy.

First of all, let me describe my bias. I don't write unit tests. I have written a giant, well-organized framework which is used by several of my web apps. I don't even have integration tests at this point. My apps are my integration tests.

Whenever I make a change, I am very careful to reason about what it affects and then proceed to test it manually across the apps. Still, bugs crop up an alarmingly large fraction of the time - probably 0.25%.

For any team, I wish we would do TDD. It's as simple as this: you want an automated system that will signal an alarm, just like a compiler. This is especially useful for weakly typed and duck typed languages.

Right now, we do have a system but it's not automated. It's better than nothing - having many clients of an API who use it heavily allows us to make sure we didn't break anything significant.

However, at this point, I would go for API Unit Testing first. Meaning - every function exposed by the API should have unit tests matching the documentation. You can document internal functions later, but start with the most outward facing ones.

There is a second consideration: VERSIONING

As for us, we have a system where code that potentially breaks existing API contracts is strictly kept in a branch. This branch is then imported by all stakeholders and tested. Once they sign off that their clients are compatible, it is merged into the mainline with a version number and a notice of breaking changes. Everyone pulling has to read the breaking changes before updating. If they aren't ready to deal with those changes, they have to clone their own repo of the framework and cherry-pick commits until they ARE ready.

On the good side, our installer automates all this. The framework and plugins keep track of version numbers for the db schemas, apis, everything, and signal when something is out of date.

Automated systems are better than blaming humans. They are worth up to 50% extra time of the project.

I should say I meant 25% in the above. I need to improve my patience before committing. And for that we need to put in place a good process, with a bigger team and checklists.

It just seems your system is overly complicated and too hard to reason about logically. Simplify it.

Unit testing is supposed to help you design better code. It's a design process that helps you find where you might be tightly coupling things. Are you mocking an external library? Ok, but how? Are you mocking the library directly? Hmmm, maybe you've lashed your code too tightly to that external library. Maybe that means you need some intermediate layer that you do control, so that when Braintree gets bought by Paypal you can change your payment processor by changing the middleware you built.

A unit test tends to expose those tight couplings so that you and your team can decide if you want to keep going down that road or not.

"If it's too hard to test, maybe you should look at the design." Best advice I ever got on TDD.

It's a tool, not a dogma. Use it like a tool to write better code, and get regression tests as a side effect.

My feeling is that unit testing is so popular among developers because it's developers that get to decide the requirements. Don't really feel like working on that mundane piece of new functionality? Then spend a couple hours doing code golf and figuring out how to test this four-line async piece of code. Nevermind that the level of abstraction required to do so will add twelve lines and five new parameters to your method and ripple all the way through your architecture. But hey you can still charge the 8 hours for that because "unit tests add value".

IMO unit tests are a business requirement and should be handled as such. Unit tests that go beyond what is required should be deleted, just YAGNI.

Regarding the point about a test conveying little information if it never fails: That's true if it has never failed in the past and will never fail in the future. E.g. maybe it really is tautological or it's testing legacy code that has been locked down forever. But if it does fail at some point in the future after having passed in dozens of iterations, that failure will convey a lot of information. Every time it passes, the information-theoretic value of that future failure increases. Unless running the test involves some onerous cost, there is no reason to delete it. Just hide it in a folder where you won't have to look at it.

"Object orientation slowly took the world by storm.."

That sentence doesn't even make sense.

In my opinion, I see unit test/integration tests as a way to refactor with confidence. It also gives the added benefit of allowing one to test your application with many different versions of dependent software.

My current project example is porting my Django application to multiple version of both Django and Python. I can now start to make my application work with python3 just by running tests. So yes, some of the tests may be testing things that should always be correct, but it helps in other ways.

A few things many coders don't worry about enough when it comes to testing:

1: Most test code is of lower quality than production code.

2: Tests often contain assumptions and those assumptions are usually not checked against any external reference. Whatever is assumed within a test is de facto law. Consider mocks, what ensures that mocks have the same behavior as the objects or services they're mocking? Mostly it's just an untested assumption.

3: By far one of the biggest class of errors in coding is omission. And there tests barely protect you at all. If you forget to implement a particular feature/aspect of code chances are that you're going to forget to implement a test for it as well. The way to solve these problems is with thorough code review and integration/beta testing.

Ultimately you get into a "who tests the testers?" problem. Which, most of the time, is answered with the resounding noise of crickets. Tests need code review. Tests need owners. Tests need to be challenged. Tests need justifications. A lot of the critical rigor surrounding testing is eroded by common practices which encourage a distinct lack of rigor around test writing (TDD I'm looking at you). Tests aren't magic they're just code, they'll tell you what you ask but if you're not asking the right questions the result is just GIGO.

Too many devs think that unit tests are cruise control for quality. They're not. Doing testing right should be just as rigorous and just as difficult as implementing features.

And since it's impossible to have error-free code, and tests are code, therefore it is impossible to have error-free test code.

Should there be tests for the tests? And tests for those tests?

(Ohhh I see where you're going...)

Turtles all the way down, right? The point being that ultimately you need to have processes other than automated tests to ensure code, product, and test quality. Otherwise you're just shifting the problem around. Bad tests can be just as hazardous as bad code, if not more so, since they can easily waste lots of development resources which could have been used more productively.

Which is why I would rather spend time designing on paper than on the computer, and enter into the computer what I know to be logically correct.

I've been building my first full blown ember app since the start of the year. I haven't got up to speed with testing yet (as opposed to my usual cucumber/rspec combo with ruby). I can tell you it is painful! I'm used to being able to change code on a whim, safe in the knowledge that my test suite is going to tell me about any regressions. We're using ember-data and there's some changes (like changing a model's association from sync to async) that can have consequences far from the code that you're working on so we've been seeing a lot of regressions. This has meant we're doing a huge amount of manual testing (I'm the only dev but there's 2 biz guys so this balance works at the moment). But I'll definitely be spending the rest of the week getting up to speed with ember testing. Maybe you can get away without tests if you have a really strong type system (i.e. better than java's) but for dynamic languages I think you're in for a world of pain without that safety net.

The way I found to maximize this is:

1. TDD is for when I am learning new programming as it keeps my wrong assumptions in check when learning new stuff

2. BDD is when you have decrease the learning curve to a low level and helps verify behavior of the system.

I use a combination of both but generally my TDD number of tests decreases when I get at a certain advance level of programming in a subject while my BDD tests generally increase.

I find this puzzling. Now don't get me wrong: I'm no fan of TDD, but if I understand TDD proponents correctly, it's a design tool, not a testing tool. I assume your software will always need designing. So are you saying that as you get more familiar with programming you need less design? Or maybe instead of TDD you meant plain unit testing?

I disagree with many things in this article.

For example, he says that you should throw away tests that haven't failed in a year. I say simply disable them and put them in a separate test suite, maybe called a validation suite. If the code that they test is changed, you can re-run those tests again to ensure no expected functionality is broken.

In my opinion there are two things that makes unit test really useful and they are API design and refactoring security. The article didn't talk about either of these.

Writing unit tests creates a mindset where the API for a class /method/module will be designed in a decoupled way to ensure that unit testing is possible. Writing unit tests force the programmer into creating software that is decoupled and modular.

You unit test suite is also a formal specification and documentation of what the program is suposed to do. This drastically reduces concerns and issue when refactoring the code base because if the tests are still passing the changed code has, with high probability, not introduced bugs. This of course relies somewhat on the tests actually being meaningfu and not tautological.

As pointed out in the article striving for 100% test coverage is not a good measurement of quality. Some things simply don't need to and shouldn't be tested.

Okay, I'm not familiar with this paper, and I've only read the history intro so far, but it was bad enough I had to stop and comment.

It's just wrong on many different levels... For starters, the point about determining which function is going to be called being determined at runtime... That's a function of dynamic binding/function selection, not object orientation.

You can have OO software that doesn't have any runtime polymorphism, and you can have non-OO software that is entirely based on dynamic dispatch.... and assuming the logic is such that you really can't know how it will be resolved until runtime, the problem is no better that static code, because the static code effectively becomes an branch based on a dynamically calculated runtime value NO MATTER THE CODING STYLE.

I could go on... but I'd have to write as much as the author has.

From the article:

>In most businesses, the only tests that have business value are those that are derived from business requirements.

This 10000 times over. I write lots of tests for financial software. My tests are always longer than my code. However, my tests are always written to business/legal requirements and never to the code in a financial software context. Moreover I try very hard to ensure the tests can be run on production software safely, so we know for sure whether or not things are actually working as expected in the field.

For application frameworks it is different. The tests are written to the specification, not to the code.

But if I had a dollar for every time I have seen a test case that shouldn't be there, because it tested some non-requirement, or worse (some behavior of a dependency).....

The argument seems confused.

'Few developers admit that they do only random or partial testing and many will tell you that they do complete testing for some assumed vision of complete. Such visions include notions such as: "Every line of code has been reached," which, from the perspective of theory of computation, is pure nonsense in terms of knowing whether the code does what it should.'

It is as if he had never heard of 'use cases', which in fact are not mentioned anywhere in the text - a pretty glaring omission. The portion of the code that is reached is irrelevant as long as you have covered all of the use cases of the system. Ideally code that is not run for any of the use cases should be removed but it does not prevent the software from being fit for purpose.

Much unit testing seems to paradoxically exist to justify OOP than to really test functionality. You'll never need to swap databases (and your DB abstraction layer probably wouldn't isolate all those problems anyway), but hey it lets you substitute mocks that let you verify that your implementation of the mock operates according to the code you happen to have written. So let's add that extra layer of abstraction, break all our "go to definition" IDE actions, because now we can prove that when we change DBs it's the DB that is wrong because it doesn't correspond to the mock we hacked together.

I think its important to note that the title says most, and doesn't say 100% of the time unit tests are bad. It specifically mentions using them for critical functions/algorithms, or 'units of computation'.

But, he outlines some of the less useful parts about it, common mistakes, and my takeaway from the article really hits spot on with some poor experiences I have had with teams that get lost in unit testing, and it really can lower the code quality and simultaneously the velocity if not approached carefully.

Some key quotes from the article:

"If you find your testers splitting up functions to support the testing process, you’re destroying your system architecture and code comprehension along with it. Test at a coarser level of granularity."

This is all too common. Especially with inexperienced developers. Management requires 40, 60, 80, or 100% test coverage blindly, without thinking about whether it makes sense to test that particular part of the code, and furthermore doesn't take into account readability, or in my experience the pain of over-abstracting something. Little is worse when trying to debug a program, and dealing with over abstraction hell to the point where all you can read is the tests, and trying to read the source code has become entirely over complicated, all in the name of keeping the code in a format where its testable, at the cost of it being understandable.

Developers that have a lot of experience with system design and software architecture are in a much better place to write appropriate tests while still maintaining understandable source code, but if I had my choice between an over complicated codebase with 80% test coverage, or a more simple codebase with 0% test coverage, I would choose the simple codebase every time.

"There’s something really sloppy about this ‘fail fast’ culture in that it encourages throwing a bunch of pasta at the wall without thinking much... in part due to an over- confidence in the level of risk mitigation that unit tests are achieving."

In modern development shops that do a lot of TDD, the tests are relied on way too much. Testing of any sort is not a silver bullet. But you find people, even in pretty big, mainstream development shops of large internet properties, relying almost solely on this. Then they pass their 'finished work' over to operations to be deployed, and when something breaks because there was not a test that counted how many file descriptors were used, the developers are always so quick to say 'well, all the tests pass, so its an operations problem now'.

"However, many supposedly agile nerds put processes and JUnit ahead of individuals and interactions."

In the article he comments on how someone told him that debugging isn't what you do in front of a debugger (obviously debatable) but that its also when you're staring at your ceiling, or discussing the inner workings of a program or algorithm with a counterpart. This is so key, and this is why pair programming is often helpful if you get into a good rhythm with someone. Thinking intrinsically about how software works is the take away here, and all too often we see people rely on tests as a silver bullet and the end result can be code that is over complicated, over confident, and when deployed is an operational nightmare. These sort of things often have a giant net loss in revenue, due to the net loss in a teams velocity to ultimately produce code that works rapidly. When developers lean on tests less (but still employ them where it counts) you'll find easier to maintain code, written by people that will step up to the plate and be responsible for that code.

Obviously there are exceptions to this, and there are shops that have the right balance, and maintain high quality understandable code, while maintaining high velocity. Personally, in over 10 years of being in this industry, almost all examples of this that I have witnessed are open source projects that are peer reviewed.

>> If you find your testers splitting up functions to support the testing process, you’re destroying your system architecture and code comprehension along with it. Test at a coarser level of granularity.

The author has horrible reasoning. Splitting up large or complicated functions is almost always a good thing.

> if I had my choice between an over complicated codebase with 80% test coverage, or a more simple codebase with 0% test coverage, I would choose the simple codebase every time.

This is a false choice. I prefer a simple codebase with 80% coverage. The notion that highly tested code must be complex is simply not true.

The author's reasoning is fine.

"If you find your testers splitting up functions to support the testing process"; he's condemning splitting the function for that PARTICULAR rationale; he's not universally condemning splitting the function.

Of course it's a false choice. He's not saying you can't have both. He's using it as an illustrative example that if you're making the code more complex just to make it more easily testable (see prior point), then you're choosing the wrong thing to do.

One thing to consider here though is that often times when you realize splitting up a function will make it easier to test, it's because your implementation sucks and is doing too much and too tightly coupled. I mean realistically how would splitting up a function make testing it easier unless the function is already complex and performing multiple tasks?

You can certainly argue that some of the clean code folks do a lot of needless abstraction that makes it harder to work on code, and I think that's true at times. But at the same time, a 200 line method doing 19 different things is also quite hard to understand and modify, and the reason testers want to split that method up is because it's really hard to understand and has too many possible outcomes.

I don't like to overly abstract things and I try to strike a balance here, but I can say without a doubt that I've never found it harder to understand and work on a single class with 20 methods that each do one thing (with descriptive method names) than I have a method with 200 lines of code doing the same 20 things. And the former is much easier to test as well.

The idea that splitting up functions makes code more complex is ridiculous. The author claims splitting up functions is "destroying your system architecture". He's trying to claim the exact opposite of what usually happens.

If the code needs to be split up to support testing then its likely that the code should be split up to support other development. Splitting up large functions generally makes software better. Whether that splitting is done as part of normal refactoring or is motivated by a test suite seems irrelevant. Saying that small methods lead to complex code is insane.

"likely". "generally". I.e., not always.

If your only motivation is "it makes it easier to write tests", and there is no other gain, it falls into the remaining case that you even allow for. You're now splitting functions that don't make sense to be split, solely for the sake of making testing easier. And that is bad. A lovely discrete chunk of abstraction is being split across two functions, that you would never call separately, solely to aid testing. And that is bad. That is all this article is asserting with the statements you quote.

No where does the article acknowledge that method decomposition is a valid software practice. It's pretty clear he considers splitting up of functions to be bad regardless of the motivation. Like others have said - if a large function is too complex to test then the codebase is probably improved by splitting it up. That is a benefit of testing and not a downside.

Positively brain-damaged.

It's not like unit tests are a magic bullet, or applicable to every situation, or nowhere deserving of criticisms, but both the criticisms and recommendations here seem poorly founded at best and frequently harmful.

Unit testing along with every development methodology d'jour is that it fails to account for the fact that some devs are good and some, quite frankly suck. There really just isn't any way around it. Unit testing does solve a subset of development problems. It also creates a whole new set of problems when managers expect perfect software because it was tdd'd. They most important trait I've seen in good developers is a deep understanding of the problem domain in which they are working. If you don't know the how and why of what you are doing nothing can save you.

Question to TDD crowd - should we write test cases to test the test cases? If your test cases are always correct and don't need verification, then why do not you write code that is always correct in first place?

Something something turtles all the way down.

When you write a unit test, you are writing code that expresses intended behavior in a different way than the original implementation. The probability of making exactly corresponding errors (i.e. the implementation is wrong but the test incorrectly checks it and erroneously passes) is lower than the probability of making errors in the original implementation independently of any tests. If your test is incorrect, chances are good that it is wrong in a different way than your implementation, and ideally this will trigger a failure that will lead you to recognize your mistake and fix it.

If you do not feel that first-degree tests offer enough confidence in the correctness of your code, then by all means write tests for the tests. But that many will find this idea absurd demonstrates that the cost/benefit ratio diminishes the more meta you get. (EDIT: Especially since integration and other tests also help contribute to the confidence that the code is correct.)

Alternatively, if your unit test is so similar to the implementation itself that corresponding errors (between the implementation and the test) are likely, then it is probably a poorly written test and of little value.

James Coplien and Bob Martin have a debate of TDD and unit testing. There is a video of it on YouTube:


"Unit tests are unlikely to test more than one trillionth of the functionality of any given method in a reasonable testing cycle."

Taken literally, sure. But are you seriously implying that it's useful, or even sensible, to write test cases for every possible input to the line "z = x + y"?

(previous HN article on testing floating-point implementation details notwithstanding)

Unit tests are a tool for understanding. Their value does not come from targeting coverage and formal correctness.

At the end of the day, we have to convince ourselves that our code works. We can do that by looking at it and seeing what it does, and by writing tests to demonstrate that we are not lying to ourselves about what we've reasoned.

Write tests for things that you are curious about or don't understand - those are thing things that need to be tested.

Using something like QuickCheck, you can generate a lot of different input for one function, including edge cases and everything in between.

Unit tests suck if you are not writing proper testable code to begin. If your classes and objects are not written to be separate from each other to a decent degree any change has widespread effects.

When you are properly componentizing your code, then while refactoring the innerworkings of classes without affecting their method footprints, unit testing becomes and invaluable tool.

I do most of my work for mobile, and in that space unit tests generally feels like this: http://xkcd.com/1319/

The vast majority of your typical app is made up of untestable UI, or framework methods and API calls that should be doing their own testing.

Now, INTEGRATION testing on the other hand...

I can't seem to justify writing tests. It is time consuming and cumbersome. I figure as long as I document reasonably well, and the design is straight forward there is no sense is wasting my time to write tests. I'd prefer to get something done instead of supposedly helping my future self(or my employers).

Really depends on what your app is doing and how critical it is to a business. For a complex application that has been in production for years, whose original developers weren't very talented and now are long gone, and your business relies heavily on customer satisfaction and bugs can cause lost customers and thus revenue as well as potentially exposing protected data, tests are absolutely critical.

On the other hand, a hobby project used by a few people or internally at your company where bugs can be discovered in use and aren't usually a big deal, eh.

And of course, there are a lot of levels between the two extremes...

An interesting debate: "Jim Coplien and Bob Martin debate TDD": http://www.youtube.com/watch?v=KtHQGs3zFAM

I usually only write unit tests for external open source libraries I am the author of, or ones for other libraries when I can think of a test case they aren't already checking for.

Objects can substantially reduce and simplify testing, if you make guarantees about them during construction, keep them immutable and abide by Demeter's Law.

short version "programmers can mess up unit tests"

let me suggest: http://www.amazon.com/Growing-Object-Oriented-Software-Guide...


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact