Hacker News new | past | comments | ask | show | jobs | submit login
Most Unit Testing Is Waste (2014) [pdf] (rbcs-us.com)
192 points by fagnerbrack on June 28, 2019 | hide | past | favorite | 159 comments

My experience with testing is like that old adage about advertising, "I know I'm wasting 50% of my money but I don't know which 50%." Most of it is a waste but it's hard to know in advance which test will stop an engineer in a couple of years from altering some fundamental contract and bringing the system down.

I have a 3 part article on testing that I need to post...

Regarding wasting time on 50% tests... it depends what your system is. In most of the systems (the no mission critical ones, no general libraries) I like to use the 80%/20% rule. I always ask: what are the 20% of the tests that deliver 80% of the value?

You learn with experience but if you do not have it, this is a simple way to learn (as a team).

During the sprint, have people write down:

1 - The tests that actually failed and found a real problem before the code was shipped.

2 - The tests you wrote for the bugs that were found in production during the sprint. If you haven't written the test yet, ask: what kind of test would have found this bug?

Have people in the team share in the retrospective meeting. If you need more time at the beginning you can hold a meeting just for bug sharing. Discuss about the patterns. Commit to write more of these kind of tests.

In a few sprint, you should see regression bugs going down. If you also take a look at the tests than never fail, you will also have a sense of the bugs you should not write.

Let me know how it goes!

> If you also take a look at the tests than never fail, you will also have a sense of the bugs you should not write.

Just checking: Did you mean "the tests you should not write"?

I'll say that any unit test for a bug which would have been caught by a more sophisticated type system is a waste. I don't know how much time people spend writing such "obsolete tests", but I doubt it's insubstantial.

What if most type system security is a waste?

I don't personally believe this is the case, but I also can't tell you I know it to be false. What I do know is that types have a non-zero cost, and I think sometimes that's ignored, and only their benefits are acknowledged. But the efforts to bring types to historically dynamic languages show that not all common dynamic language idioms are amenable to reasonable type signatures. While some idioms (e.g. monkey patching) are problematic, that's not the case of all of them.

Apart from a bit more typeing up front I cant see many costs to typing.

Try programming in Rust. The upfront cost in figuring out how to write the code to minimise unsafe()'s can be very high. This is because Rust's type system is so strong it both does the work of the garbage collector and catches things other languages would miss, like memory leaks.

That aside, that "bit" of typing can amount to a lot. Python programs are usually very compact compared to statically typed languages. Yes, this means writing more unit tests to cover the bits the type checker would have caught, but in my experience the total LOC with unit tests for 100% coverage in Python is still less than a statically types language.

That said, I still prefer statically types languages. The big difference between statically typing and unit tests is when the error is caught. Static typing catches the errors very early. In fact in modern IDE's that do incremental compiles it's literally after you have typed the statement. Errors that hang around tend to ripple through your code because your mental model is wrong, so catching them early means less errors in total. The effect can be large.

Python is very compact even with heavy use of type annotations. It's not the types that add the bulk of the code, other languages usually are more verbose in other ways also.

In a statically typed language, you sometimes have to make the code more verbose to satisfy the type checker. In Python, you can just not use it in those cases.

There are costs, and they go up significantly the more sophisticated your type system is/the more speicifc you try to be with your types.

The way I see it there are two kinds of costs type systems may impose. Very simplistic type systems, such as the ones in C, Go, and even C++, Java, C#, often force you to write code that is too specific, thus forcing duplication (C and Go especially suffer terribly from not having generics - in Go especially its impossible to write a well typed function that can be passed an array of any kind and return an element of that array).

On the other hand, of your type system is powerful enough, you still get a similar problem when trying to write code that is abstract enough to be reusable, but specific enough to be safe.

For example, if you were writing code for a physical simulation system, you would want to have all quantities carry their measurement unit in their type, to avoid mismatches.

However, you wouldn't want to rewrite maths code for each measurement unit. However, to properly keep track of types in complex maths code gets pretty ugly pretty quickly. For example, a function that wants to compute the scalar product of 2 2-dimensional vectors would only work if the first values of each vector have the same/convertible units as the second pair, in any order (m,s dot s,m should be ok). If you move to matrix multiplication, the types become even more complicated, and this is still very basic algebra. If you want your function to be applicable to arbitrary matrix sizes you're already writing a complex library.

I have two thoughts on this.

First, static typed languages often simply subtract features that would be available in dynamic languages. So, that's a cost that's paid on day zero, and it can be easy to forget about.

Secondly, there are absolutely times when something that is conceptually easy to describe is a real beast to model in a type system. Think DSLs for SQL, for example. A non-trivial amount of time can be lost fighting these battles.

That's because some of the cost you pay by changing the way you think so that it doesn't even occur to you to write programs that the type system won't like.

They can also be more difficult to refactor and adjust at a later date. This can be especially problematic if you're in an environment where logic needs to be altered rapidly in response to changing external conditions/requirements.

The added "difficulty" is just the cost of doing a proper, correct refactor.

Dynamic languages are not helpful just because they let you write bugs faster.

I wouldnt even call that difficulty. That is a bloody loved feature in my book

Developers are expensive though. If dynamic languages let you write buggy code in half the time that it would take to write perfect code in a static language, that would be an acceptable trade-off in many contexts.

> If dynamic languages let you write buggy code in half the time that it would take to write perfect code in a static language, that would be an acceptable trade-off in many contexts.

Shit at 50% off is still shit. You're not being paid to produce shit, even if you might be really, really efficient at doing so.

I assure you that many developers are being paid to produce shit.

Fixing bug later on get progressively more expensive, I though we had got over the KLOC metric decades ago.

Can you give some examples why - do you mean changing a variables type? that should jut be a matter of a find and replace.

That's because you (and many others) are drinking the static typing koolaid

Now try serializing things in XML instead of JSON. But not one type of message, multiple types, with flexible formats.

"Oh but this never happens" except it does.

Not to mention the iteration speed whenever you need to fix something.

?? sorry what do you mean, I was happily reading multiple types of logs with multiple different format records for a Billing system using statically typed PL/1G in the 80's

Apart from both json and XML are a PITA to work with in the first place

JSON is wonderfully simple to work with - in a dynamic language. Having to map it to a statically known schema, as you would in a statically typed language, loses that simplicity and can be, as you say, a PITA.

I think that's a defect of JSON not statically typed languages.

"defect of JSON" while it could be better, it has some type definitions and as I said, people are too deep in the static world to see outside of it

The problem is that sophisticated type systems only catch a subset of the bugs that a unit test can catch. For example, let's say I'm adding the ability to transfer funds from one account to the other in a banking application. I want to display a warning when the amount of money being transferred is over a certain percentage (let's say 95%) of the funds in the account. That's pretty easy to do in a unit test: create mock account, call the transferFunds() method, and verify that the warning is triggered when the value being transferred is over 95% of the amount in the account.

How would I do that with a type system?

This is very much not the kind of bug that the parent was talking about. He's talking about stuff like passing an int value to a (supposedly) string parameter. Basic type stuff.

I also agree, as years ago I wrote a large amount of javascript which had type assertions pretty much everywhere. They didn't take long to write but I'm very glad I did them, and I'd rather have the language do it to save me time, clutter and maintenance.

Lax typing is a real cost. I've been working on some SQL I inherited and even SQL's not-too-bad typing allowed the original coder to mix types where I wouldn't, and create potential runtime errors eg. to assign the contents of a 64-bit integer field to an 32-bit integer field. That's legal and gives no warning. It's also suddenly a runtime error when it exceeds 2^31 (and there are other possible consequences such as screwing up the optimiser). I'd much rather it forced me to match types exactly, or explicitly cast.

The parent never claimed that a type system means bug free code, or replaces all unit tests. However, it does mean you need less tests.

In a typed, compiled language you get some pretty nice guarantees just by getting your code compiled. In a dynamic language, you have no way to know if your code even runs until you have 100% code coverage.

To my mind, a unit test should test a unit: something which functions independently, and which is interacted with through an abstraction layer. Type systems are good tools for clarifying and catching bugs around these abstraction layers.

Maybe my understanding is wrong, but what you described doesn't seem like a unit test. I think the appropriate term for a test like this at the junction of UI, business logic and code-level triggers is 'feature test', 'integration test', or maybe even just 'test'.

No mainstream type system can do that so GP would be fine with a unit test here.

You could use Coq, Isabelle, or Lean. Their type systems are powerful enough to allow such checks.

If anyone does have this case it's two distinct functions in my mind the first function does the comparison and returns the warning

You can then assert and type against the warning function, the printing function you could only assert against with output buffering or something like that but in my mind that's less likely to go wrong anyway

Keep your type system simple, custom types beyond in built primitives cause head aches imo

At the risk of veering off topic, I'm genuinely curious what kind of bug that unit test is designed to catch?

Given that transferFunds shows a warning over that 95% threshold, what type of change is going to break that logic? Some sort of botched code rewrite where values doesn't retain their meanings? That batchMode shouldn't trigger the warning? That race condition between the check and the actual transfer?

Most of the unit testing I see is more like test-of-defintions (is fullName still 80 characters wide and doesn't accept null bytes?) which confuses me because a definition can only be correct or incorrect in context.

I accept the usefulness in dynamic languages in the absence of types. My Personal preference is for tests to be a bit higher level (does login with a username containing null still cause an exception?) but I have come to terms with the fact that few people agree with me across multiple organizations, so there must be some point to this trivial testing that is completely lost on me.

> My Personal preference is for tests to be a bit higher level (does login with a username containing null still cause an exception?).

These tests are great, but what if username has a bunch of validations on it?

For example, it's reasonable to think a username field might be validated with:

- Must be required

- Within 1-32 characters that match a certain pattern (let's say a regex to limit it to lowercase letters, numbers and -)

- Must not be a blacklisted word (admin, administrator, etc.)

- Must be unique (enforced with a database index)

Pretty standard stuff. Are you going to write 5 integration tests for this? 4 to test each validation and then the success case? These would be tests that exercise your entire web framework's routing stack from request to response (ie. the user visiting a /register URL and then submitting the form).

Personally I would not. I would write 1 unit test for each of those things (4 unhappy cases where I assert a specific validation error for each invalid input and 1 happy case where with valid input I expect 0 validation errors). In this case, the "unit" would likely be a `register_user` function that accepts params as input and either aborts with validation errors, or succeeds by writing the record to the DB.

Then, for an integration test I would have 2 tests. One to make sure with invalid input I end up with some type of error displayed in the HTML response (it doesn't matter which one), and another test with the success case to make sure things work when they should (such as the user is registered and a new record was created in the DB).

So I end up with a tiny bit of overlap in tests. Technically the unit test for the success case doesn't need to be there since the integration test covers it but I usually include it for the sake of completeness because it's usually like 4 lines of code to make that test but I'm not 100% opposed to someone saying it should be left out.

> Some sort of botched code rewrite where values doesn't retain their meanings?

Yes, exactly. It's supposed to catch the rewrite where somebody turns (a * 100) / b > 95 into (a / b) * 100 > 95 and suddenly, the code doesn't work anymore.

Showing a warning is full scenario - it has a whole UX. If this warning is important enough to have a PM, a UX designer, and be translated into 20 languages, it's important enough for the engineer to make sure it actually shows up when its supposed to.

I think the point is that if that was written in JS you would have a test to ensure your method behaves correctly with characters passed in, or strings, or just nil(/nul?) Whereas in Java et Al, you don't need any of those tests, you just make your parameter some sort of numeric and that just isn't a problem any more.

While I agree that this is hard to implement with a type system, I never want to see this implemented as a unit test.

What you describe is a real use case that I personally want to have tested on a real system with all the stuff in place (but test database instead of the real one). It has nothing to do with one unit you can test on its own.

And the crucial problem here is that you tie your testing code on the inner workings of transferFunds, when you are mocking stuff.

> and verify that the warning is triggered when the value being transferred is over 95% of the amount in the account.

Please also note that you cannot verify this property using testing. You can only verify that this works for one particular set of values (or a big but finite set of values). Testing never can verify that something always works. That's the wrong tool for this job.

With TDD, writing unit tests accompanies writing code - which in itself necessities thinking in terms of testability, which is a good thing making the program's components more separable. Regarding the former example, mocking only a single part where it connects to other parts of the system (for example some interface through which it sends warnings to users) is easy and makes the resulting code better (eg, easy to replace the notification manager).

> With TDD, writing unit tests accompanies writing code - which in itself necessities thinking in terms of testability

I, as a user of your software, only care about the feature, and that the feature works. I don't care about testability, whatever that means. And since I case about features, I think this entails that features have to be tested end to end, and not some unrelated classes ("mocks") that are not even used in the final product.

> is easy and makes the resulting code better (eg, easy to replace the notification manager).

It might or it might not. I have seen too much code which was written way too complex which a lot of unnecessary classes and interfaces, just for the sake of "testability". However, some features did not work every once in a while.

If you have the need for different notification backends, go ahead. Even if you need a notification manager (whatever this is) for this. But don't make code more complicated (sometimes called "test-induced design damage") without need.

Is it a waste to keep the test around until you're ready to switch to a more sophisticated type system?

It adds to the technical debt of the project. Some teams may never decide they're "ready" for a more sophisticated type system. That's fine, but they're still on the hook for maintaining those test suites which might pose significant roadblocks to refactoring or rewriting parts of the system.

You can prove a merge sort correct with dependent types. However, a typed implementation of merge sort will arguably be more difficult to mold into a TimSort if it isn't dependently typed. What you say?

I don't have any experience with dependently typed languages. I hope (schedule permitting) to take a course in dependently typed programming (using Agda) and formal verification in the fall. I'll have much more to say on the topic after completing the course.

A question, coz Im not sure I follow. How is type system (as described by wiki) supposed to test domain or functionality?

Or are you talking here about completely different type system ?

What you can do with the type system depends on the language. It can't replace tests entirely, but it can usually make some more boring tests superfluous, and in more exotic languages the type system can be surprisingly helpful.

Presumably you’d need to also know that the unit tests aren’t being used locally while developers develop. It’s pretty common to run a unit test watcher while doing local development, and have that catch breakages that never make it to CI.

My take on that is to make tests using a unit test lib, but a bit more macro. Midway between unit tests and e2e. It's not pure, it's not best practices, but it's great value.

This paper is emblematic of a serious problem in the software development field: lack of empirical research. It is unsurprisingly filled with opinion and anecdote, with little mention of research in the area. And in the one actual study cited, "Does Test-Driven Development Really Improve Software Design Quality?", the author mis-characterizes the findings. Not that there is a huge set of research on unit testing to refer to--there isn't--and hence the problem.

"Software development has been characterized from its origins by a serious want of empirical facts tested against reality, which provide evidence of the advantages or disadvantages of the different methods, techniques or tools used in the development of software systems."[1] In my view, the best thing we could do is to adopt "Evidence-Based Software Engineering"[2] as other disciplines have. This is more likely to have a major positive impact than the newest and hottest language, tool, or technique.

[1] Reviewing 25 Years of Testing Technique Experiments. https://www.researchgate.net/publication/220277637_Reviewing...

[2] Evidence-Based Software Engineering. https://dl.acm.org/citation.cfm?id=999432

I agree entirely. The more tendency there is to push development as an "engineering" discipline, the more evidence based research needs to be used to solidify or disprove certain development dogmas repeated over and over again, which in many cases, seemingly have no empirical evidence beyond a few anecdotal success cases.

This is the norm in the industry, "you're doing it wrong, the best way is this way..."--based on what evidence that pertains to this case or has proven generalized success applicable here?

In most cases, it's someone's successful anecdotal experiences which worked for the specific cases they were involved with. That doesn't mean those approaches can be abstracted away and generalized to all cases, but many in this industry do that regularly and critique others' approaches based on that. It becomes this competitive ego contest: well my work was at x, y, z solving q and was successful--making me an authority on a, b, c's problems solving p because x, y, z is a leader (appeal to authority fallacy)... etc.

It's one thing to treat development as much of an art (which to me, it very well still is) but once you start treating it as a concrete discipline, you need to provide the lyme, cement, aggregates, water... evidence and studies showing approaches and how they faired across controls and varied cases.

Agree with you and parent poster. I remember 15 years ago or more going through CMMI certification and there was a push to get to level 3. Our company hired a consultant who came in every eight weeks or so to track our progress, give direction etc.

On one of his visits he started interviewing engineers individually to see how things were going and I asked him what the point of it all was and I could tell he was quite taken back. But then, not surprisingly, he said to be more efficient developing software. I then asked him if he or any organization he'd worked with had ever actually tracked the time it took to implement the process, to which he answered no. Then I asked him, then how the hell do you know if any of this is making software development more efficient?

Is there a good way to fix it? I mean, if you were building bridges, certainly another engineer could show you data about material properties and successful builds, and you could reasonably extrapolate to your new (unique) design fairly well, but that's because you're building with the same physical materials and operating under the same laws. With software, it is a bit harder to find that common ground, because the 'laws' change if you are on a different architecture or handling different kinds of 'traffic', etc. For a bridge, you can just make a bigger beam because you would never try to build close to the scale which would make the material properties irrelevant, but with a computer, you can do this, and you might start to invalidate prior evidence. A large amount of software development dogma is built around scaling up, but that's also a huge problem in every engineering & scientific discipline.

Though, to counter my own counter-point, it would be nice to have better analytical tools; more informative & accurate ways to understand & visualize how resources are actually being used when a program runs.

It is actually simple. Just look into any business where safety and security matters.

I'm in automotive [0]. There is somebody who has to sign that the software is safe. Without someone to make that signature the car does not get released.

For Free Software, look at Sqlite. There are some nice slides from 2009 [1].

In general it results in some simple rules like "100% test coverage". Of course, these rules (while simple) are not easy to satisfy and certainly expensive.

[0] http://beza1e1.tuxen.de/aspice.html [1] https://www.sqlite.org/talks/wroclaw-20090310.pdf

I strongly agree but I warn that evidence is a lot harder than it seems because a lot of what we do is very hard to measure or even define. How do you reliably measure programmer productivity? Lines of code, inflection points, units of functionally... all have their flaws.

I did not read the article. I did read all comments

Unit tests for the most part aren't about "testing". They are developer tool. To verify rthat modifications (refactoring, additions, bug fixes, etc) doesn't break contracts etc. Oh and showing that your code is a codependent mess of poorly isolated spagehtti, if your unit tests are hard to write, the code under test is a mess. Inittests are more useful in languages with loose or poor type systems.

Many unit tests are not written well. testing more than the interface/contract. Or full of complexity boilerplate and mock. Which means code needs fixing.

Some of our older tests at work are an absolute nightmare to deal with. We went through a phase where we basically just mocked everything non-trivial and ended up with a rewording of the code itself. Any time you make a change to that code, even if it still behaves exactly the same from a user point of view, the tests blow up into pieces and you end up having to rewrite or throw out most of them.

We moved to a behavior-focused testing style, and it has been much more robust to non-functional changes and refactorings in the code.

What do you mean by behavior-focused testing style?

Presumably, testing that the API of each unit produces the desired result without consideration for how it accomplishes that internally. If it relies on some other component internally but never exposes that in the public API, it’s reasonable to consider that component as part of the unit under test and not try to mock it out.

And if you do need a mock (for something with hard-to-undo side effects), you set it up with stub response & canned data in the test fixture, and never assert anything against it in the test.

Part of the problem is that most test fake libraries support mocks (usually they even have it in the name), those mocks have complex “verify the 3rd call was function doDoAction with parameter (‘yes’, ‘really’)” and then people think that full-blown mocks are the only way to guarantee 100% correctness.

Unit tests don't guarantee that your code units are well designed. You can still write unit tests for spaghetti code. In fact, sometimes it encourages it because it encourages antipatterns like dependency injection; because developers get the urge to inject mock objects into the units from the unit tests.

Just think about it at a high level. If your module depends on another module, is it really always necessary that this module be substitutable for another module that has the same interface (e.g. another similar module or a mock)? 99% of the time, the answer is no! In fact it's often a problem because this wrongly assumes that if a class exposes the required interface, then it is compatible. There is more to compatibility than just interface; if the submodule maintains its own state (OOP), then its behaviour can change in a way which could break the dependent logic. You can't have Dependency Injection for everything because it's never so simple; these customizable parts of the code have to be designed very carefully.

I actually don't get where you are going with this breakdown. Can you give me an example.

I mean when you have a class which, instead of importing/including/requiring its dependencies directly, expects instances of various classes to be passed into its constructor for example. So the class cannot do its job unless you pass its dependencies into its constructor... This is an antipattern which unit tests tend to encourage because it allows you to decouple the unit logic from its dependencies and it makes it easy to inject mocks in the place of dependencies.

A unit test should only test a single class so that means you need to mock out all other classes which your class depends on. Mocking out dependencies in the test code is difficult or not feasible and that's why developers often resort to Dependency Injection in their source code but as mentioned before, it is an antipattern.

Also, I noticed that a lot of developers confuse unit tests with integration tests but the definition is quite clear: If your test covers more than one class without mocks, then it's an integration test. Integration testing does not necessarily mean end-to-end testing. It could just be a single class with its internal dependencies.

Passing dependencies in the constructor is the only acceptable way of having stateful dependencies (like a database or some external services). I am assuming you are against passing things like Math libraries by creating an instance, which I would agree with.

The only reason dependency injection gets a bad rep is magical frameworks which obscure the actual wiring and end up causing bigger problems.

I also would not pass in a database client instance. I would pass in the config for the database client though.

I think that the class which ineteracts with the database directly via the client should be tightly coupled to the database client. It's not very often that you change database and when you do, you can just swap out that entire class completely. Classes which interact with the database should expose simple interfaces for performing actions against the database and those wrappers should be replaceable.

Then how would you handle connection pooling, if every class which interacts with the database has its own instance of the database client?

Surprisingly, the article doesn't engage with this argument! Unless I missed it, the author never mentions that unit tests can make refactoring other code easier.

In my experience, after code is checked in, unit tests have two main purposes:

1. Checking that functions aren't completely broken in dynamically typed languages (i.e. a check so basic that a static type checker can do its job) 2. Allowing programmers to refactor code without being terrified it will break something far away in the execution path

And like, that's it. That's enough to make them useful to have around.

I don't think I've ever seen unit tests that didn't need to be changed fundamentally to deal with any sort of refactoring more complex than extracting a function.

Unit tests are supposed to work with the internals of a module and not just its boundaries. Since refactoring normally tries to significantly change the internals without changing the boundary, it usually also results in a rewrite of the unit tests. Integration tests are the ones that usually help me sleep better after a refactor.

I don't think I agree with your premise, but even if I take it as true--- it's the unit tests of the other modules that help me out, by ensuring I haven't managed to totally bust the module I'm working on.

Some people argue that a unit test of a module should mock out its dependencies and not rely on them. For unit tests like that... well, I endorse the article.

The thing is that most people don't understand that the code needs to be tested anyway. If you don't test it, you are just an idiot who has no proof that the code does what it claims to do.

Now because you have to test it anyway, you can just simply spend the same time writing a unit test instead of executing the code with manually configured non-repeatable test cycles, a.k.a clicking through the UI, or sending Postman requests, etc.

Also it's kind of selfish not to make something repeatable by others.

And as someone pointed out before me, unit tests are not about testing at all. It's a documentation about the system, and what it's supposed to do. Also it's the way to stop the next person who works on the code to ruin something by not knowing the business rules.

> the code needs to be tested anyway. If you don't test it, you are just an idiot who has no proof that the code does what it claims to do

First, in most cases you're going to test your code manually anyway- whether you write unit tests or not. So writing unit tests is just in addition to the time already spent testing manually.

But most importantly, unit tests often do very little to prove "your code actually does what it claims to do". They prove you have tested some cases, that's all. In many occasions, the code under test doesn't even "claim" to do anything in particular: the claim is that you're delivering a feature, not pieces of code that behave as expected. All your code fragments can behave exactly as expected and still the feature might be broken, or miss implicit requirements, or ill conceived.

Finally, unit tests as documentation are disastrous, as they document only details of specific implementations, as opposed to the feature you're expected to deliver. Good comments are a hundred times better that tests in that respect.

> First, in most cases you're going to test your code manually anyway

Only if you are coding things with UI's. If you are creating a new API endpoint on some service or creating a new service or adding some new parts to a maths library there is no point in manual testing.

I feel a lot of the disagreements people have on development methodologies comes down to people working in different domains. When I do UI work I do mostly manual testing, when I do library work I do only unit testing.

Not all forms of testing consist of separate test suites like unit tests. Assertions, contracts, encoding constraints in elaborate type systems and having them checked at compile time are also ways of coding repeatable tests.

And if it's some poorly coded legacy application or something that's mostly wrapping a proprietary black box, working around other peoples bugs, and things like that then system or integration tests may be easier than attempting to mock a gordian knot's tangles.

Unit tests work best for properly encapsulated pieces of code, utility classes, libraries and such.

Yes, code must be tested. Line and branch coverage just don’t have the actual impact that some people claim it does. It is, however, better than nothing at all.

The arguments in the paper are anecdotal and I took them to be “question the cargo cult” which is a valuable thing to do. The author is merely encouraging you to think because “automated garbage is still garbage.”

Good read.

> If you don't test it, you are just an idiot who has no proof that the code does what it claims to do.

Does having tests make it any different? You would need to test those tests to make sure they are testing the right thing after all (ad infinity).

Untested tests are still a lot more useful than no tests, just like how an untested program (while probably worse than a tested program) is still usually better than no program at all.

Also, tests are actually subject to very basic testing in the red-green-refactor TDD cycle: they start out failing so you know that they actually test something when they start passing.

To add to this sentiment, tests also give teams confidence that is needed. It gives engineers confidence when developing new features because the test suite will break if you introduce a regression. It gives engineers confidence during deployment/CD.

While the tangible benefits of unit tests are very important, there are other intangible benefits that are equally important.

I’d say that confidence is a tangible factor.

The single most important reason to test your code seems to be that it allows you to refactor your code while ensuring it still meets the original outside expectations.

The author’s main gripe seems to be about tightly coupled tests that make refactoring larger systems more difficult, and about prioritizing meeting arbitrary metrics (lines of codes covered) rather that thinking critically about the actual benefits.

This is in line with his emphasis on integration tests determined by the business. I think that same thought process can be applied to internal components of code as well, and you can empirically determine the quality of your approach (roughly) by evaluating how often your tests change. If nearly every change breaks a test, that probably means they’re low value/too tightly coupled.

Excessive mocking seems to be the biggest source of evil in that regard.

What confidence do unit tests add that other kinds of tests do not?

The confusion here between unit testing and automated testing in general kind of illustrates the author’s point: we’ve become so obsessively focused on one kind of testing that we aren’t thinking critically about its alternatives (of which there are many others besides “no tests”).

Code needs to be tested anyway, but integration and functional tests can accomplish that more cost effectively.

Cannot tell you how many times I have seen projects with hundreds of green unit tests at 90% line coverage and yet regress on, miss important and obvious cases, or simply have never had their headline functionality.

"The cross product of those paths with the possible state configurations of all global data (including instance data which, from a method scope, are global) and formal parameters is indeed very large. And the cross product of that number with the possible sequencing of methods within a class is countably infinite."

Sounds more like a condemnation of OOP than of unit testing, and I do genuinely feel sorry for the unit testing, OOP purists out there. I prefer to design more functional methods, which operate on parameters and injected config instead of instance state and/or globals (cringe). Incidently, this approach makes full coverage attainable.

"Large functions for which 80% coverage was impossible were broken down into many small functions for which 80% coverage was trivial. ... Of course, this also meant that functions no longer encapsulated algorithms. It was no longer possible to reason about the execution context of a line of code in terms of the lines that precede and follow it in execution"

I can reason about such an implentation MUCH more effectively by glancing at the small bit of higher level code which integrates everything (and as mentioned above by foresaking instance state and polymorphism). This strikes me as a bit like advocating a flat directory structure cause it's important to be able to see all your files at once.

Yeah a lot of the 90s/early 2000s OOP stuff I learned in school seemed to always result in really tightly coupled systems and bespoke webs of tests and fixtures that strung along weird dependency chains in unwieldy spaghetti piles that did no good. Following TDD has helped me land at decoupled functional interfaces like the ones you've described, and it all scales and composes so nicely, yet stays very tractable.

Robert Martin sketched 2 diagrams in [1] that elegantly illustrate these two different design patterns and how testable usually means composable and more tractable:

[1] http://blog.cleancoder.com/uncle-bob/2017/03/03/TDD-Harms-Ar...

I wish he gave actual examples to illustrate what he was talking about. So you have an additional API layer the tests hit to call the functions you’re testing? Do endpoints map to classes? Modules? So aren’t you just tightly coupling this API to your service? When the service changes, you still have to update the API. Can you explain to me how this solves the problem?

One of the things I dislike the most about unit tests is how it's used for "quality theater". Some examples:

- Trying to use coverage percentage metric as a sign of quality. As if a simple percentage means anything about the quality of the tests. It's as useless as using lines of code as a way to measure progress.

- Not recognizing that useless unit tests are harmful for the code maintenance. It makes refactoring code into better structure difficult and developers just give up because it's too much work to fix the tests that weren't even providing any value in the first place. This ignorance is expressed in statements like how there's a testing "pyramid" with unit tests at the bottom and end to end tests at the top. Which is a nice sounding soundbite and image, but is useless. Forget pyramids, just write tests where it makes sense.

- Code review comments that try to look smart with "where's the unit test?" Then the developer doesn't want a long drawn out fight about how a unit test would be useless here since there's a huge crowd just cargo cult yelling "code coverage!" "unit tests are good!" "pyramid!" So the developer just writes the stupid test to get the code merged. This is also an example of how harmful code reviews can sometimes be, when there's a popular stupid idea, code reviews perpetuate it because the developers who know better just get tired of fighting the same fight over and over again.

I really hate useless unit tests. I've seen tests that setup a mock, configure it to return a value when called, call the code using the mock, then check the returned value is the mocked value. This tested absolutely nothing! I've seen unit tests that verify every line of the method was called, completely pointless. The point is supposed to be to verify that for a given input, it has a given output, not lock it into a specific implementation by verifying every line of code ran a certain way.

There is an idea for structuring software "functional core, imperative shell". Write the software this way and the natural place for unit tests and integration tests becomes obvious. But nope, the industry is all about unit test coverage percentage, stupid pyramids, looking good in code reviews. It's all quality theater, not actual focus on quality.

> I really hate useless unit tests. I've seen tests that setup a mock, configure it to return a value when called, call the code using the mock, then check the returned value is the mocked value. This tested absolutely nothing!

I've had a similar disagreement about the validity of such a test.

The other mocking issue I've seen, is people mocking blackbox third-party APIs over which they have absolutely no control, which sometimes leads to passing unit tests, failing integration tests and head scratching all around.

I'm with you people on this mocking issue. It is about time that mocking be recognised as an anti-pattern in automated testing, or at least as an undesirable tool of last resort.

Mocks often contain the same bad assumptions and misunderstandings about the mocked API which the developer used during the implementation of the unit they are trying to test.

If you feel the need to mock something then you should first ask yourself whether an integration test can do the job for you. Actually, I would generalise this advice to: Don't write a unit test when you can write an integration test.

> If you find your testers splitting up functions to support the testing process, you’re destroying your system architecture and code comprehension along with it. Test at a coarser level of granularity.

The emphasis here should be on the reason for splitting up functions. Long, complex functions can be difficult to understand, and removing a few lines in exchange for a (well named) function call is very beneficial for the reader. The opportunity for testing comes from this delegation. A function call is a contract, and the test ensures it complies. Now the reader can comprehend what the code is doing at a higher level; trusting the sub functions do what they intend.

It's quite the opposite: The whole point of the architecture is often to facilitate testing. If we didn't have to write tests we could do without a lot of that boilerplate.

Doing a typical "front-end" mobile application I write the following tests:

* Integration tests that test the API I get a ton of value out of it though it's sometimes hard to guarantee a certain state at the API's end. But I use them while building the API's but they also can detect any kind of problem after an API upgrade etcetera

* Unit tests that test data transformations This is stuff like date formatting, building headers into a request given certain input, building more complex ViewModels that take a few structs as input and turn them into something that reflects the actual process happening in the view. They're valuable while building the logic, help separating the logic since you need to make it testable and also help a ton if you find a bug since a bug simply means adding another test case to see if something goes wrong and then fix it.

I don't think I even get to 30% code coverage but I think what I cover is super valuable and the other 70% is usually mostly CRUD boiler plate.

I personally find testing incoming APIs a waste. Sometimes I write tests to verify that I treat the incoming data correctly if I transform it in fancy ways, but other than that, you should trust the API contract. It's almost like writing tests for the frameworks you're using.

Granted, that's from a unit test perspective. Integration tests of APIs are invaluable. I wish we had them already where I currently work.

> but other than that, you should trust the API contract

As someone who's been working in corporate integration for the past 6 years..

Never trust a contract. Be in WSDL, OpenAPI spec, Word documents or otherwise. I've worked with large tech vendors, I've worked with finance, I've worked with large consultancies. The only people that seem to get it right, is the people you don't want to work with because it's soul-crushing - think HL7 et al.

The point is you can't write unit tests for broken API contracts, so either you trust that the contract is upheld, or you've got bigger problems on your hands.

Do you mean integration tests? I don't think API contracts should be a part of a unit test. A unit test should be self-contained, unless of course there is a tight coupling to an API.. Which I would steer clear from.

That's why I value integration tests over most thing. I can see immediately that something is broken at a high level, what business impacts it has and explain what systems are affected.

I got progressively more and more frustrated with this article, largely because he keeps making statements about the impossibility of covering all states a class may take on (true!) but then followed up with espousing more use of integration and system tests, which are clearly combinatorial harder to test “completely”. He also implied at the very beginning that somehow this was driven by the switch from FORTRAN to OOP, as if FORTRAN avoided this combinatorial explosion (magic!) and it’s only because of polymorphism that we live in the “unit testing is good” world.

The logical contradictions eventually overcame me.

> espousing more use of integration and system tests, which are clearly combinatorial harder to test “completely”

The point is that when you do integration tests, you will test the underlying classes the way in which it actually matters. You can't test every possible condition a unit can have, but you can test for the most likely.

This is my experience too. Realism in testing is criminally underrated while code coverage is criminally overrated.

IME unit tests can only effectively substitute for integration tests where you're testing logical/algorithmic code with simple function inputs/outputs.

I somewhat agree, somewhat disagree. IMO the largest value I get from unit tests is the confidence that I can make changes and understand what breaks. Refactoring w/o unit tests makes me feel like I am flying blind

The units we test are almost never big enough for internal refactoring. It’s the decomposition into units that wants refactoring, and there the test suite actively fights back (mock expectations in particular).

If there were enough code involved in a test that we could meaningfully refactor it while keeping the test green, we would call it an integration test.

This is no more or less likely in an integration test vs a unit test. One of the first things you learn in writing unit tests is that you should be selective in picking your inputs. This is even taught in those “trade schools” he subtly derides. You regularly see both public test cases and hidden test cases run against problem sets, at least that’s what my interns tell me.

Interesting so they don't teach testing by deliberately trying to break things, pasting Arabic text into an address field for example.

When testing NAPI like to use edge cases like the company with the longest name eg "Donaudampfschiffahrtsgesellschaft" or the Famous Welsh location "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch"

I am pretty sure I crashed an A17 clearpath mainframe by doing aggressive testing

This is the sort of input that you'd selectively pick in a unit test, too. What you don't do is also picking inputs like "llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch" too (all lowercase) :)

That's QA, not the sort of testing we are talking about. In case QA happens to find some bug like that, after the fix, we should include another test case with the formerly breaking one

For the same number of lines of test code, an integration test case traverses through much more source code than a unit test case. Not only that, but it also checks that the units are interacting properly and passing each other the right data and returning the right data. Good integration tests can also fee us from static typing because the integration tests implicitly check that the interface contracts between all the different units are met internally. It also verifies many other factors which affect the robustness of the software. For example, they can identify race conditions and prevent them in the future.

True, but they're also slower, so you'll have to either test fewer cases or run the tests less often.

They are usually slower but they can still be pretty fast. Also, with integration tests, you can choose the granularity of your test. You don't need to test end-to-end (of course this is the slowest), you can just test a single module with all its dependencies. If you don't mock out dependencies, then it's still technically called an integration test.

I saw cases where there really was a waste of stupid unit tests, and no significant tests for the interaction of those units, so I can see where the author is coming from. But my conclusion, after many years of developing with different levels of testing, is that it's a good idea to:

* Have at least one unit test for each nontrivial unit, but no unit tests for the trivial ones. But, in these unit tests, use mocking and stubbing only when using real objects isn't feasible, or very inconvenient - isolation is nice to have, but generally overrated in my experience

* Create as many unit tests as necessary to feel confident about the edge cases for the most critical units

* Implement higher level tests covering the main use cases for the interaction of all the units

I'm going to pile onto your comment, since I heartily agree.

> In a given computing context, the exact function to be called is determined at run-time and cannot be deduced from the source code [in OOP languages] as it could in FORTRAN.

This is not necessarily true, and I don't think this was true when this was written, either. (The Wayback Machine first saw this in 2014, and the paper doesn't date itself.)

Most member function calls in C++ (non-virtual ones) can be resolved at compile-time. Where idiomatic C++ could would use a virtual function would require some equally uninspectable construction in any language, because one would use it when one needed run-time switching. Rust lacks OOP in the usual sense, but if I needed a virtual function of sorts, I might reach for a trait type, which also wouldn't be inspectable (beyond whatever semantics the trait establishes).

There's Java, JS, and Python, I do admit. And I find my unit tests often fill in the static analysis I wish I had. But the truth is more nuanced than I think the author conveys.

> Unit tests are unlikely to test more than one trillionth of the functionality of any given method in a reasonable testing cycle. Get over it.

> (Trillion is not used rhetorically here, but is based on the different possible states given that the average object size is four words, and the conservative estimate that you are using 16-bit words).

This is as vapid a definition of test coverage as the line coverage the author derides in the prior paragraphs. A trivial function,

  fn add_these(a: u64, b: 64) -> u64 {
    a + b
could not be adequately unit-tested, because I could never cover the many trillions of input states it has.

I think there's always an implicit line to "well, if the code under test is absolutely nuts", it's not the test's fault for not catching it, and that if the function under test (with the same contract as the function above) is errantly coded something like,

  fn add_these(a: u64, b: 64) -> u64 {
    if a == 11771008849893880921 && b == 5668622331333919113 {
    } else { a + b }
then what amount of testing is going to catch that?

> I define 100% coverage as having examined all possible combinations of all possible paths through all methods of a class, having reproduced every possible configuration of data bits accessible to those methods, at every machine language instruction along the paths of execution.

While this somewhat contradicts the earlier example of states. I don't think we should aspire to this, since I think many "combinations" are not interesting to test; they're just pieces of the code that really don't interact; testing them is redundant. Yes, this hurts our ability to formally define a protocol for testing or not testing, but it prevents the inevitable combinatorical explosion.

> Business people, rather than programmers, should design most functional tests.

Absolutely not. A large part of my job is coming up with the actual requirements of the system from the vague machinations of business people who barely know how a computer works, let alone can specify salient requirements. How are they to write the tests, when they can't clearly articulate the requirements?¹

> Turn unit tests into assertions.

But I want to ensure these don't fire under normal circumstances. We do this, in my current code base, and we get notified of them, and of any crashes in general. But it's a failure that was noticed, and might have downstream consequences, as opposed to one caught by a unit-test.

Then the parts about Eastern Europeans with bad Internet make better programmers because they had to think instead of using the Internet, and how he grew up under similar circumstances. Real Programmers!

¹I'm not saying business types "should" (or should not) be able to articulate requirements/specifications.

Basic code coverage analysis should catch that add_these doesn't have full branch coverage.

Additionally, some fuzzing tools can actually to branch analysis and target your "rare" codepaths. Now, sure, figuring out how to hit some codepaths basically requires brute force - it's not going to magically figure out an efficient way to collide SHA1s if that's what it takes for your test to fail - and you need a way to differentiate good from bad behavior (this might be a reference implementation, or this might be as simple as "does it crash?") - but you do have to get crazier than add_these before you hit the limits of even existing tooling to catch bugs.

> then what amount of testing is going to catch that?

Mutation Testing could. :) https://ai.google/research/pubs/pub46584

Luckily it's not only a research result: actual libraries and tools exist for a lot of languages. See this list: https://github.com/theofidry/awesome-mutation-testing

Same here. That articles just mixes true facts and weird ideas without coming to a real punchline at the end.

At least, I didn't take away more than 'throw away tests that haven't failed in a year' and I don't agree.

Ironically modern fortran is object orientated...

When I started programming, I'd think up lots of clever ways to avoid repeating things. Ten years on, that code is a nightmare to maintain because changing the behaviour for just one call site of hundreds is next to impossible, because everything gets funnelled through one extremely DRY group of modules.

Redundancy vs. Dependencies: it's the dependencies which kill you. Redundancy is often a good thing.

If you state an algorithm only once, as the implementation, then the next programmer only knows what it does, not whether it's correct.

This is, in my opinion, the main value of unit tests: state the algorithm twice, once as implementation and once as expectations, and if they don't agree then something's wrong. While the odds of a bug existing in any line of code haven't changed, the odds that the exact same bug exists in both sides are much lower.

Any bugs which survive that are probably in design rather than implementation, ie. my mental model of what the module should achieve is wrong somehow. Catching that is a job for integration tests.

I definitely agree with the 80/20 rule here. 100% code coverage is neither necessary nor desirable, and 20% is fine if it's the most valuable 20%.

Unit Tests = The assurance/confidence that you get, if something changes logic, this test will break & you will know it. Aside from just limiting bugs - this is more powerful.

95% of the time when a unit test fails for me, it is the test that need fixing instead of my code. It hardly inspires confidence.

To me this suggests that there is no clear understanding what your units are supposed to do.

My second theory would be that the unit tests are written by the inexperienced developers while the good developers write the other code.

The tests will involve 5 mocked dependencies just to test a simple if statement. They test the implemention and are a waste of time.

Isolated units that are well tested usually don't need to be updated in my experience so they rarely break.

Tests that have 5 mocked dependencies is usually a signed we done messed up somewhere.

That's also my experience, but consider how many of the remaining 5% would have led to a regression that would have gotten shipped in production if the test didn't caught it ? In my experience, this number way above zero.

Is that because you're not writing tests while you're adding code and just breaking old tests?

Having a robust test suite is very nice. Having an overly pedantic test suite is not. Things like getters / setters that use the same library you've used 1000 other places have no need to be tested again. Add tests for the base concepts and trust that they'll continue working based on those.

Kent Beck (yes, the Kent Beck): “I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence”


> If you want to reduce your test mass, the number one thing you should do is look at the tests that have never failed in a year and consider throwing them away. They are producing no information for you — or at least very little information.

I feel like there is some subtlety here that most will miss. Specifically, "and consider". That is the part we are really bad at.

Those tests are helpful when you do major architectural changes, which may not happen every year.

Tests also help in general development. Make some changes, then fix the tests. If you are running tests locally on your machine (and offline) how can you be sure that every time a test fails locally the failure is logged?

There's a difference between tests that haven't broken because they are in stable code, and tests that haven't failed because they are testing tautologies or even just completely failing to break when the underlying code is broken.

I've lost count of how many times I've gone into an old test suite at work and found that the tests were still passing even though the code they were testing had completely changed or been removed.

Sometimes tests are written very poorly. The codebase benefits from their removal.

... and you don't always know what is a "major architectural change" that's going to break one of these tests, so you can't just relegate those tests into a category that is only run for major architectural changes.

The tests that never failed for years are usually unit tests for fundamental things. Say your app has a math library at the bottom as in my example. It has thousands of years for things like vector additions, matrix multiplication and so on. And of course these never fail - they have no dependencies and we never modify it anymore. But we might modify it, say to plug in some other library to do math in 5 or 10 years. I don’t want to delete them for that reason. Luckily this kind of low level test usually takes very little time - we have a 15k test suite and the 5k “fundamental” tests take seconds while the whole integration level takes hours.

I've never tested 100% of my code.

But I've also never written a unit test that didn't expose bugs.

That's mainly because I know what parts of the code will be dicey, and focus unit tests on them.

That’s funny. I only write tests to prove that the program does what it’s meant to do and not to find bugs.

The goal is to prevent a future programmer, including myself, from breaking any of the declared properties.

Tests don’t prove anything, they just show the intended behaviour. Only full verification proofs show correctness.

Too often I’ve seen “tests show it’s correct” suites horribly fail to provide value when the behaviour changes in a unit of business logic and the brittleness needs to be unwound and replaced with robust assumptions.

For me, it’s all about maximizing quality; which is usually about bugs. For example, I’m currently writing an ONVIF driver. This needs to address hundreds of devices. I only have about ten to test against. They feature a wide range of implementation, but they are, by no means, full coverage. I’m gonna need to mock up some crazy, nonexistent devices to stress a few particular parts of the library. If I can make it work with them, then chances are better than even that it will work with devices I can’t directly test. I use test harnesses to ensure functionality. My test harnesses tend to be huge; often, a lot bigger than actual shipping implementations.

In practice I personally tend to do the reverse. If I encounter a bug I create a unit test to replicate its circumstances and use it to ensure that the bug stays dead and gone.

In terms of TDD I can't make sense of this. Do you only write tests after a bug has been found?

Doesn't sound like TDD, but being purely responsive doesn't follow either.

If I'm doing I/O on untrusted data, of course I'm going to fuzz test it thoroughly and probably find bugs.

If I'm writing new RAII types - containers, smart pointers, etc. - I'm going to write tests in an attempt to exercise double frees, or any kind of rule-of-3+ violation I can think of, and I'll probably find at least one thing I overlooked if it's complicated enough. The people building code need to be able to trust some of their foundational tools, at least, to be nearly bug free.

If I'm abstracting system APIs into a cross platform representation, I'm going to write unit tests to compare behavior, because the abstraction is probably leaky and fails to fully abstract in some edge case - possibly due to bugs in one of the N system APIs I'm targeting, or possibly because I simply forgot to implement a codepath, or possibly due to strange edge cases.

If I'm writing SIMD abstractions, I'm going to write some basic math unit tests to catch the occasional compiler codegen bug for the less thoroughly used and tested intrinsics.

If I'm writing code that I simply know to be fundamentally brittle, I write tests to catch when it eventually breaks. For example, I wrote some unit tests for rust, to catch when the module standard types are implemented within internally change, which will in turn break the .natvis files relying on those internal type names.

If I'm writing lock-free multithreaded code, I'm going to write a metric ton of tests to try and suss out edge cases, because I know a single bug slipping through can result in weeks of time lost to chasing heisenbugs, and I know my reviewers probably can't catch everything either, and you'd better believe it's going to catch a bug at some point.

This is very interesting, when you contrast it to [https://dl.acm.org/citation.cfm?id=3106270].

I'm always weary of arguments that rely on 'unit tests are more complex than the code'. If that's true, then it's correct. Tests have to encapsulate the software contract that's being enforced and should be more complex than the underlying code.

The utility of unit tests greatly depends on the type of software you're building. Building a DB without tests is difficult and unlikely to win many customers - particularly when news of that gnarly data race comes out. Whereas a crud webapp where everyone can see what's working and what's not is a bit hit or miss.

At the very least testing helps communicate to a code reviewer "hey this thing does what I think it does".

When implementing lexers and parsers for language specifications, I have taken the approach of testing each token (keyword, number, etc.) for the lexer for each EBNF symbol. This repeats tests for common keywords and other constructions, but means I can see that the lexer handles all the tokens in each EBNF symbol. In this way, these tests are more BDD (behavioural driven development) in style, but without things like gherkin/cucumber. With this approach, I do full coverage tests for a given symbol/token, and just check the basic symbol/token case elsewhere.

I do a similar thing with the parser tests -- have a test for each valid complete symbol production, and then tests for as many error cases that the parser can handle for that symbol.

With that base, I can add additional tests for bugs and additional error recovery cases I implement.

I never see unit tests as a waste as they provide a set of regression tests that are invaluable for refactoring and making other changes to the code, like implementing new features or better error handling.

Most insurance is waste

If your house didn’t burn in the last two years, consider not insuring it anymore.

I don't just code for a single project. I create libraries when I write a piece of code that I know will be useful down the line. And for those libraries that are going to be reused throughout many applications I think unit testing is gold.

Absent from this discussion is the use of generative testing. At the unit level they can automatically enumerate a variety of scenarios the programmer would likely overlook.

I just discovered this, only it was called 'property based testing'. Do you recommend any resources for learning to use it effectively? I haven't dug in yet, but so far it hasn't clicked when thinking about it. I like the idea a lot, but I guess I don't see where it would be most useful for my own code.

I run a one-man project and I write test first unit tests for all of my code. This has enabled me to advance the code with new functions without breaking current API’s. The tests are not in the way and usually doesn’t need to change because I change the implementation. It has been tremendous for productivity because I don’t need to worry about if stuff breaks. I can trust my tests for changing those and so far they have always done that.

After 15 years of writing unit tests, I haven't reached the same conclusion as the author.

What I've found in my time is that unit testing can be good, but like anything it's not a panacea. It requires discipline, and like normal code, it has code smells.

Black box unit tests are the most likely to be good tests, and white box unit tests are the most likely to be bad tests. The more you depend on the inner workings of a function in order to test it, the more likely it is that you are coupling your test to the implementation rather than the purpose of the unit being tested. Once you tie to the implementation, refactoring becomes a LOT harder, because changes will break the tests even if they don't break the functionality.

Mocks are also a major source of trouble, and more likely to be a code smell. If your tests are using a mock to test how many times your unit called it, either your tests are bad or your architecture is wrong.

There are three main kinds of code:

- Code that fetches data

- Code that stores data

- Code that transforms data

Mocks are necessary when you mix these. If you have a function that opens a DB connection, fetches data, transforms the data, and then stores the data, you now have an extra problem to deal with (the database), when all you wanted to do was test the transformation. Things would be far easier if you separated the transformation out, tested that in isolation, and then integrated that encapsulated functionality with fetching/storing code. This also improves separation of concerns and code duplication, since now your fetching/storing code can be generalized and also tested in isolation.

Actually I lie. There is a fourth kind of code: code that modifies state. This is the evilest, smelliest code around, and it's also something that unfortunately we can't get completely away from. But we can manage it, by isolating state, reducing the need for or scope of the state, and providing "configuration object" function entry points to make testing these monstrosities less nasty.

Code coverage is not just a measure of quality, but also of waste. If your code is not being called, then one of three things is happening:

1. It's error checking code for another API it's calling, which you normally shouldn't be writing tests for (unless that API is known to be buggy and you need to guard against it).

2. It's not contributing to the goals of the program, and can be taken out.

3. It does contribute to the goals of the program, in which case you need a test for it.

You can't reach 100% code coverage because of (1). But you absolutely should check WHICH code is covered in your tests because of (2) and (3). Anything higher than 80% coverage is pure luck, and tells you nothing about quality or wastage. In many cases, even 60-70% is sufficient.

If you don't use mocks then won't you just end up with integration tests?

The higher up you go in your component hierarchy, the more likely it is that you'll need a mock component to test them, but I've found that it's possible to structure your architecture such that you don't need to do it as often as you'd think. If a component is using another component simply to fetch some data it needs (VERY common in my experience), then you could restructure things such that you just pass that data into the component from outside in the first place, possibly even treating it as a simple transformer instead of an actor.

There's a limit to this, of course, but keeping this principle in mind in your design also has the effect of bringing coupling to a minimum or even zero in some cases, and promoting re-use of these now completely independent components. When components don't care where or how they get their data, they become a lot simpler due to the elimination of state (implicit and explicit). Elimination of state also facilitates making functions idempotent, which massively reduces cognitive load when reasoning about a system.

Who cares what they’re called as long as they’re fast and don’t crash when running at scale/fill the disk or memory

Because if you're not using mocks the tests get unweildy. Suddenly changing components three layers down the dependency tree breaks "unit" tests at the top. Those are the tests you just end up @ignore'ing.

Mock things like database access, or better yet pass the database connection via an interface and create a test version of the database access so you can create a database-like thing for your tests (e.g. an in-memory database).

If you are depending on another component of the application (a lexer, a JSON class, a maths function, etc.) don't mock that because if that class breaks you want tests to fail instead of silently passing because the broken class/function was mocked.

If you are depending on thirdparty libraries, don't mock those unless you have to (i.e. if you cannot run the tests). This will help avoid unexpected bugs after upgrading libraries or if supporting different versions of a library.

If you are writing code for a complex infrastructure (e.g. an plugin for an IDE), try to use test-specific versions of enough of the infrastructure to get the rest functioning, and use the real versions of as much as you can. This will help pick up issues in your code when new versions of that infrastructure make changes -- you want your tests to fail in this case, as the real code would fail.

Understand why your tests are breaking and address that. Use @Ignore as a last resort. If APIs or behaviour has changed, update the code and tests to reflect that. If you are supporting different versions, create version-specific compatibility layers.

As someone that has written thousands of tests - what I disagree with you on is when writing unit tests those application components should have their own tests. Then you can cleanly mock them. Even if they are utilities etc.

Same goes for application code in the same service or whatever. Mock those calls and only test your unit of code.

If you don't do this things can be fine. But at some point the code base will become unwieldy and changing code in one place will break tests all over the place.

Integration tests are fine and they have their place but are not a substitute for proper unit tests.

I've generally preferred integration tests to unit tests but Storybook has completely changed how I write React components, for the better.

Agreed on integration tests over unit tests. I get the most bang for my buck with them. Storybook is pretty nice. I use it for developing smaller components, more of the building blocks of the application. Have you ever checked out Kent C Dodd's react-testing-library? It was a refreshing approach to testing react components IMO and it has gained quite a bit of adoption.

The docs for react-testing-library look like they offer a great deal of wisdom in a tiny package. I find my React components tend to be simple enough that I'm generally far more concerned about whether the CSS works than the JS, leading me to probably keep my focus on Storybook for the moment but I suspect I'll be reaching for react-testing-library before too long. Thanks for the heads up about it.

I wrote a response to this article (and the follow-up) here: https://henrikwarne.com/2014/09/04/a-response-to-why-most-un...

I agree. Unit Test is for testing API usage, not implementation. We can use it to figure out the API design good or not.

But the problem, is API change is very often, and we don't want to change both implementation and test just for the sake of API change.

Integration test is enough in most case.

Even API is the implementation details of an abstraction.

This article seems to ignore the fact that unit tests are typically made for other programmers down the line as a form of documentation, as an informal description of business rules, and as a way to gauge whether new code will break the product in a very obvious way.

tl;dr I made it to "Testing does not increase quality; programming and design do" and quit.

It seems clear that the author has a strong opinion on this and perhaps that has been formed by exposure to unit tests done wrong. I suppose his article is worth reviewing and asking if any of my unit tests suffer from the problems he identified. I think most of his complaints relate to badly applied unit tests and I think we can all agree that any methodology can be badly applied. That does not provide grounds to condemn the methodology.

Having used Fortran (and Macro-11) as the first languages I employed professionally, I do not recall any thrust for unit testing at that time. Maybe it was just the shop I worked in. More recently I have used unit tests for Go, C/C++, Python, Perl, Java, shell scripts and probably some I'm forgetting. I wouldn't consider coding anything w/out some kind of unit test.

Back to the quote "Testing does not increase quality; programming and design do", I disagree vehemently with the claim that testing does not increase quality. While true that one cannot 'test in' quality, I find that designing code to be testable provides higher quality results. This is particularly true in languages such as shell scripts that often start small and grow until they are hundreds of lines of in-line code. Testable code is generally better partitioned and structured than it would otherwise be.

A second benefit to unit testing is immediate feedback and completion of parts of the system. I get a feeling of accomplishment when I complete something that passes its tests and prefer that to deferring satisfaction until the entire thing works.

Finally, if I test the bits in isolation, I can provide them data for all of the corner cases I think could cause trouble and make sure they work for a wider range of inputs than could easily be done during integration testing. When I do get to the point where I put the bits together, I have a much higher success rate with integration testing.

Obligatory(?) "Unit testing without integration testing" pic:


> If you find your testers splitting up functions to support the testing process, you’re destroying your system Why Most Unit Testing is Waste 4 architecture and code comprehension along with it. Test at a coarser level of granularity.

This is very much in line with DHH's opinion about testing in general (the discussion "TDD is dead" is about this). He said that he doesn't want to split logic for the sake of testing, so HE drives the design NOT tests. I very much agree with this. I think a human can design better code (meaning that's easier to use, so easier to consume by other humans) than any automated process (like TDD).

I noticed his blog post quotes this PDF: https://dhh.dk/2014/tdd-is-dead-long-live-testing.html

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact