Hacker News new | past | comments | ask | show | jobs | submit login
Unit Testing Is Overrated (tyrrrz.me)
501 points by ingve on July 9, 2020 | hide | past | favorite | 387 comments

I tend to mentally divide code into roughly two types: "computational" and "plumbing".

Computational code handles your business logic. This is usually in the minority in a typical codebase. What it does is quite well defined and usually benefits a lot from unit tests ("is this doing what we intended"). Happily, it changes less often than plumbing code, so unit tests tend to stay valuable and need little modification.

Plumbing code is everything else, and mainly involves moving information from place to place. This includes database access, moving data between components, conveying information from the end user, and so on. Unit tests here are next to useless because a) you'd have to mock everything out b) this type of code seems to change frequently and c) it has a less clearly defined behaviour.

What you really want to test with plumbing code is "does it work", which is handled by integration and system tests.

I've seen this concept called by many names including CQS[0] and Functional Core, Imperative Shell[1]. I'm just leaving this comment here for those that are interested in reading more.

[0] https://en.wikipedia.org/wiki/Command%E2%80%93query_separati... [1] https://www.destroyallsoftware.com/screencasts/catalog/funct...

Functional / imperative doesn't exactly map on to these two concepts. "Computational" is often imperative and integration code isnt always imperative (a lot of react code would fit in this box, for instance).

This is always a fun corner of programming terminology to me.

A ton of dense, mathy code like hash computation, de/serialization, sin/cos computation, etc. is usually best implemented in a memory efficient C-style way but lends itself to be used in a very functional way; inputs and outputs without any retained state or side effects.

I think that subtlety is hard to articulate and gets lost.

I agree, but if you watch the linked screen cast you can see that its just Gary Bernhardt's take on Hexagonal/Clean/Onion Architecture.

The idea that your business logic should be isolated from external dependencies (in his case, by making the code (pure) functional). That makes it easy to unit test the business logic, and your integration tests should be minimal (basically testing a single path to make sure everything is talking to each other).

This is adapting the structure of your codebase to the capabilities of unit tests.

It has advantages but it's an expensive waste of resources if you have cheap, effective integrarion tests.

Cheap integration tests is usually an oxymoron (when cheap refers to the tightness of the developer feedback loop).

Gary was coming from the land of Ruby-on-rails where a full set of integration tests could take hours. In that environment, structuring your code to enable easy testing of complex logic makes a lot of sense.

Likewise in a large enterprise environment, where integration testing across a (usually messy) set of interconnected dependencies is a pipe dream.

It's true that over-architecting is something to be wary of, but as usual, there's no one-size-fits-all answer.

It's cheaper to have a TDD test that takes 15 seconds to run and launches into an embedded kernel than it is to re-architect your whole system.

It doesn't matter if the whole test suite takes hours. CI servers don't need to be supervised.

One of the benefits of writing tests is that it makes it painfully obvious which parts of your codebase are poorly architected. Difficulty in writing tests is a code smell.

That's because unit tests couple tightly to your code. If you're trying to couple something additional to your already tightly coupled code it's gonna be painful.

It's a really expensive way of discovering that you wrote shit code.

"Computational" code that isn't by the vernacular definition of "functional"--and you can write functional code regardless of programming language--is something of a red flag to me.

Operate only on your inputs. Return all of your outputs. No side effects.

I think OP meant that functional code is inherently imperative under the hood.

Sure, but "functional" code, in the vernacular, means a rather less specific version of that.

You can, after all, write "functional C". (It can be hard, though.)

I do like "abstract" and "concrete".

Abstract code solves made-up problems, while concrete code solves real ones. Normally the best way to solve a real problem is by rewriting it as a series of made-up problems, and solving those made-up ones instead.

The made-up problems don't need to be pure computational. Instead, if you restrict them to pure ones, you'll lose a lot of powerful ones. They also don't need to fit functional programming well, but there is no loss of generality on imposing that restriction.

Also, the more abstract you make that code, the less they'll need changing and the better unit tests will fit. At the extreme, once debugged they'll never change. Instead, if your needs change too much, your concrete programs will simply stop using them and use some completely different ones.

I think the problem with this line of thinking is that, most often, the difficulty lies exactly in rewriting the real problem into the made-up problems. So, to have useful tests, you need to check if your made-up problem solutions actually solve the real problem, which is difficult to express in the first place.

For example, let's say you want to get some users from your DB in response to an HTTP call. We rewrite this problem in terms of crafting some SQL query, taking some data from the HTTP request to create that query. We can of course easily test that the code creates the query we designed that the query contains the right information from the HTTP request etc. But, if we don't actually run the query on the actual DB with the actual users, we don't really know if our query does the right thing, even if we know our code creates the query we intended. And, if the DB changes tomorrow, our very abstract code that parametrizes a particular SQL query will still need to change, so our existing unit tests will be thrown away as well.

This is the kind of plumbing code the OP was talking about, and I don't think you can reduce the problem in any way to fix this (especially if the DB is an external entity).

There's nothing abstract about that query. You can easily confirm it by looking how you described it exclusively by business terms. Instead, it's the most concrete component on your comment, and it's not prone to unit testing in any way.

Humble Object[0] is another version of this.

[0] http://xunitpatterns.com/Humble%20Object.html

> I tend to mentally divide code into roughly two types: "computational" and "plumbing".

I agree with this and would go even further. Divide your code into "stateless" functional code and "stateful" objects code.

Original OO was encapsulating things like device drivers that did I/O--it didn't represent data.

If you don't interleave your stateless business logic with your stateful persistence, it's easy to mock "objects" that do the plumbing, and all the meat of the program is unit tests.

Fwiw, the DI model (Guice, Spring, etc.) in modern Java/Scala shops closely hews to this, even if people don't mentally categorize it as such.

> Divide your code into "stateless" functional code and "stateful" objects code.

IINM you are basically referring to the difference between static and instance methods in languages like C++ and Java.

Putting code that neither reads nor writes the object state and instance methods is a common mistake made in both those languages.

That said, both stateful and stateless code are good candidates for unit testing, especially when the code under test is a state machine, rather then just a data encapsulation mechanism.

Nah brah. (I'm gonna say "brah" because I'm feeling especially salty)

Your _program_ should have the flow of a function. At the architectural level, who-the-ef cares about about static vs instance methods in Java (I say as a person with 23 years of Java experience.) It has nothing to do with languages. You can do this in any language you want.

You want to have your inputs go through a process where you have (1) INPUT state transfer, (2) some computation F(INPUT), (3) some output and state transfer, or RESULT = F(INPUT).

If you do not have (1) or (3)--I hate to break it to you--but all your program does is burn CPU. If you don't have (2), your program does nothing at all.

The key thing with scalable systems is they manage complexity well. If you're at the level where you're worried about "static or instance methods", you're not dealing with how data changes in large systems at all. Those words are at the level of state within a language.

You need to optimize at the global systems level.

> At the architectural level, who-the-ef cares about about state vs instances methods in Java

Who-the-ef should care is anyone who has to implement or maintain the code. After all, the debate at hand is what is worth unit testing, which very much concerns the programming language and the actual implementation. Don't know about you, but I both architect the system and write the code.

> If you do not have (1) or (3)--I hate to break it to you--but all your program does is burn CPU.

I haven't written production code that doesn't have (1) or (3) in my 25 years of programming, so not sure who you are talking to here.

> If you're at the level where you're worried about "static or instance methods", you're not dealing with how data changes in large systems at all. Those words are at the level of state within a language.

You have to tend to this stuff at both the generic data processing and language level. Using a given language's constructs for differentiating between stateful and stateless code is an important part of making the code document itself.

Coding style matters.

> Coding style matters.

It. does. not.

If it did, PHP wouldn't be running half the world. Structure and systems matter.

> It. does. not.

OK 'brah', whatever works for you!

> If it did, PHP wouldn't be running half the world

PHP has a style guide, and there is such a thing as clean, readable PHP code.


I bet massive scale PHP based apps (like you know, Facebook) probably enforce style in their codebase.

This kind of categorising I've always found to be orthoganal to what should really be the measure of "does it deserve a [unit] test?" . I believe the correct way to assess how much (if any) automated testing at whatever level is decided only by how valuable that thing, and the inverse of the impact of that thing going wrong, is.

If you are writing on one shot script to transmute data from one format to another for say an upgrade, I don't care if you have unit tests if I am confident it has been manually tested to satisfaction. No repeatability, no regression requirement. There could and likely is value in TDD so tests might still be a thing if that is how you work. No objection there.

If you are developing the plumbing code that will ensure my system adheres to financial regulations and, if it were to break, land me in jail for negligence, you can be damn sure I'm demanding a test that will be run everytime that system is built/deployed.

I wrote unit tests >10 years ago for formatting a string for postal codes that I know are still run to this day on every commit because if they get it wrong there is legal recourse for the company that owns that system.

It's also super quick to fix and failing at build is quicker and cheaper than failing in prod, even without the recourse. That test took me all of 1 minute to write. Bargain.

> I wrote unit tests >10 years ago for formatting a string for postal codes that I know are still run to this day on every commit because if they get it wrong there is legal recourse for the company that owns that system.

If it's critical for your business I'd categorize that as business logic, not plumbing code, well deserving of unit test coverage.

I don't really care for the distinction is my point. It's valuable and that's all that matters.

> I believe the correct way to assess how much (if any) automated testing at whatever level is decided only by how valuable that thing, and the inverse of the impact of that thing going wrong, is.

Unit tests and automated tests are two completely different concept.

Vehemently disagree. Unit tests are subset of automated tests.

Wonderful description. In brief:

Unit test algorithmic code; use integration tests for everything else, i.e plumbing code.

There's a very controversial conclusion to this : some projects ought to have zero or close to zero unit tests.

I agree very strongly with this but a lot of people will be very unhappy with this idea.

I agree with this, I also admit that 95% of the code I write is plumbing code. (There is an art to making your plumbing nice).

Yes I agree as well. Our company uses Spring to write banking software and there is rarely a case that involve purely logic that can be separated from its dependencies. I used to try isolating code into separate methods that took no dependencies but it just made the code harder to read. Now we just test invoking the grpc endpoints and include the db (with rollback) and it works quite well.

I would suggest making the business logic stateless methods ("functions") that take data records that are immutable passed between it.

That allows strict separation of all I/O from testable business logic.

If you can't separate pure logic from your I/O, it means you have a Russian-doll program that looks like:

readFromApi {



Instead of a pipeline like:

a <- readFromApi

b <- doBusinessLogic(a)

c <- writeToPersistence(b)

If you do things this way, you can always isolate your business logic from your dependencies.

The problem is that doBusinessLogic(a) is often entirely about transforming a into whatever the current DB accepts. Sure, you can write a test to check that b.Field_old == a["field"] , but this buys you very little. The real question is whether you should have mapped a["field"] or a["oldFields"]["Field"] to b.Field_old, and your unit test is not going to tell you that, you need an integration test to actually verify that you made the right transformations and you're getting the correct responses.

By all means, if the transformation is non-trivial, and it is captured entirely in the logic of this method, not in the shape of the API and the DB, then you should unit test it (e.g. say you are enforcing some business rules, or computing some fields based on othee fields). But if you're just passing data around, this type of testing is a waste of time (you don't have reasons to change the code if the API or DB don't change, so the tests will never fail), and brittle (changes in the API or in the DB will require changing both the code and the tests, so the tests failing doesnt help you find any errors you didn't know about).

> The real question is whether you should have mapped a["field"] or a["oldFields"]["Field"] to b.Field_old, and your unit test is not going to tell you that, you need an integration test to actually verify that you made the right transformations and you're getting the correct responses.

So I would argue you don't actually have business logic then. Your service is anemic, and you have a data transformation you need to do. I definitely think that you should do an integration test for that.

Moving JSON -> Postgres or whatever is something that you absolutely still can test with the output of the DML statement by your DB library. It may be a silly test, but that's because if there's no business logic, it's a silly program _shrug_.

While it's bad form to reply to your own post, I might add this is just what a function is in the large, but you're viewing your program this way.

a <- readFromApi ( Input x )

b <- doBusinessLogic(a) ( f(x) )

c <- writeToPersistence(b) ( Output y = f(x) )

You can also imagine that there are more than one lookup from the db or service calls as I/O in different parts of the pipeline (g(f(x) etc.), but it's always possible to have state pulled in explicitly and pushed down explicitly into business logic as an argument. It tends to make programs have flatter call stacks as well.

The people that place a controversy on this can be safely ignored.

Wise words spoken before me:

>> Write tests. Not too many. Mostly integration.

Watch the boeing fall from the sky. (Developer missed that particular configuration in testing)

The amount of effort spent finding errors before you ship it has to be related to to cost of fixing errors including the consequences of the errors if they're found after you ship.

If errors in your system result in death, and if changes must go through an expensive and time consuming process to be approved, and then an expensive and time consuming process to be applied, you should spend a lot of time ensuring your design is sound, and your implementation matches your design. A good place for formal methods.

If you're writing server side code, and deploy takes 5 minutes, you can be a cowboy for most things that won't leave a persistant mess or convince customers to leave.

If you're writing client side code that needs to go through a pre-publication review, neither cowboy or formal methods is a good choice.

What if you're a libffmpeg or libsdl developer?

Where do you slot device drivers in that hierarchy?

Yes! I do something similar which is sometimes referred to as functional core imperative shell. My goal is to put as much code as possible in the computational/functional part. This part is easy to test since it's pure. The remaining plumbing/imperative part has much less code, less dependencies, and less logic, which as you say doesn't need unit tests anymore. It needs less dependency injection as well, which is a huge bonus.

You should still be careful that your pure logic is actually doing something by itself, rather than just massaging data from one external format to another external format.

A lot of code can be in this are where it is absolutely unit testable, but the unit tests are almost entirely useless, as the code only ever changes because the input or output types change, so the tests also need to change.

I think of this in terms of code that is 'authoritative' for its logic or not.

For example, a sorting method is authoritative - it is the ultimate definition of what sorting means. Also, a piece of code that validates some business rule defined in a document is the authority for that business rule.

But a piece of code that takes input from the user and passes it to some other piece of code is not authoritative for this transformation. The functionality of this kind of code is not defined by some spec, but by 'whatever the other piece of code wants to receive', which may be arbitrarily hard to define.

Depending on the complexity of the transformation, there may still be reasons to test parts of this code, at least to ensure that a new field here doesn't affect the way we transform that other field there, but often only small pieces of it are actually worth testing.

This has been a problem for me with my quarantine project. It's little more than a CRUD app: get some data, download it, display it on the screen. There's practically no business logic to it; the entire project is wiring up various XaaS. By the time I mocked everything that needed to be mocked, I'd have put more effort into mocking than the project itself.

I test the parts that are actually mine as best I can, but most of my debugging consists of driving it by hand.

> By the time I mocked everything that needed to be mocked, I'd have put more effort into mocking than the project itself.

More importantly, that your app works with the mocks doesn't give you good information about weather your app works with the actual services.


i unit test business logic since that is the core of the application and MUST work as expected.

i'm not going to unit test a link that someone clicks on goes to the page they expect.

This. Making the distinction between the two is huge. Save time and money testing the right pieces of a codebase.

Absolutely. One other takeaway is "write less plumbing code". Write library code that simplifies your plumbing, and unit test that.

This is sort of what I've done with some success in developing games. Games in general are grossly under-tested, but there are a few good reason for that. Lots of systems can be effectively tested by just playing, and often it's tough to tease out as small of units for useful isolated testing as you would in other types of programs.

What I've been doing is writing as many parts of the game as libraries as is possible, and then implementing the minimal possible usage of that library as a semi-automated test. For instance, our collision system is implemented as a library, and you can load up a "game" that has the simplest possible renderer, no sound, basic inputs, etc. and has a small world you can run around in that's filled with edge cases. This was vastly easier than trying to write automated tests for 3d collision code, and you get the benefit of testing the system in isolation, if not automatically. For other libraries like networking, the tests are much more automated, but they poke the library as a unit, rather than testing all the little bits and pieces individually.

I really wish I had come up with this, it really neatly captures my experiences and how sometimes unit tests were really useful (Developing a (Benefit) Claims Engine which essentially did a bunch of complex calculations and then spit data out) whereas other times, unit tests just feel like a massive chore with mocks and similar stuff that add little to no value and certainly should've been at a higher level (integration or system tests) but the powers that be wanted coverage.

> I tend to mentally divide code into roughly two types: "computational" and "plumbing".

I think of the "computational" type more as a "deterministic data transformation" type. That applies to transformations of any data whether text, images, or the state of a machine.

I think of plumbing as the movement of data without any transformation, or if a transformation occurs, it occurs at and abstracted layer that must be unit tested itself independently.

My thoughts exactly. Unit tests are a huge help in computational-heavy portions of a project and are easy to write. The other areas of a project don't benefit as much and the tests are harder to write and keep maintained.

I'd add that perhaps for this 'plumbing code', the way you describe it, gradual/static typing is a great solution.

Computation can be seen as plumbing. I think what you mean by "computational" is complicated plumbing.

Same point being made in the blog post. I do recommend others to read the post though - good stuff.

I completely agree, but it doesn't help me hit the code coverage goal foisted upon me.

What if you write code that isn't for a business? How does your workflow apply then?

business logic does not mean it has to be for a business. It's more like calling the pointy part of a spear the "business end". It's the part that does the job.

Business Logic is a euphemism. It doesn't mean literally business logic, it means the 'core functionality of your code.' When you design software, you typically model some real world process or system in the abstract. Business logic is the core problem of your model. You can also call it model code, or core functionality. It all means the same thing - it's the important part of your app.

Using the old Asteroids arcade game [1] as an example: The business logic is how many lives the player has, what happens when you shoot asteroids (they break up, or disintegrate if they're small), what happens when you reach the edge of the map (you wrap around the other side), what kind of control scheme there is (there's momentum in asteroids, you don't stop on a dime) etc.

1) https://www.youtube.com/watch?v=WYSupJ5r2zo

"domain logic" might be a better euphemism. Consider a library that encrypts text with AES-256. You might want unit tests that verify the IV, cypher block, plaintext and encrytped text (result) of that function. The method, "encrypt" might be your "business logic" that ought to be unit tested.

"Business logic" is just another name of "logic". E.g. something like "if X is even, then print 'fizz' otherwise print 'fuzz'" is considered business logic.

Wish I could upvote this more than once.

I can't believe I'm wasting my time on another testing debate.

Speaking as a formerly young and arrogant programmer (now I'm simply an arrogant programmer), there's a certain progression I went through upon joining the workforce that I think is common among young, arrogant programmers:

1. Tests waste time. I know how to write code that works. Why would I compromise the design of my program for tests? Here, let me explain to you all the reasons why testing is stupid.

2. Get burned by not having tests. I've built a really complex system that breaks every time I try to update it. I can't bring on help because anyone who doesn't know this code intimately is 10x more likely to break it. I limp to the end of this project and practically burn out.

3. Go overboard on testing. It's the best thing since sliced bread. I'm never going to get burned again. My code works all the time now. TDD has changed my life. Here, let me explain to you all the reasons why you need to test religiously.

4. Programming is pedantic and no fun anymore. Simple toy projects and prototypes take forever now because I spend half of my time writing tests. Maybe I'll go into management?

5. You know what? There are some times when testing is good and some times where testing is more effort than it's worth. There's no hard-set rule for all projects and situations. I'll test where and when it makes the most sense and set expectations appropriately so I don't get burned like I did in the past.

One of the dark arts of being an experienced developer is knowing how to calculate the business ROI of tests. There are a lot of subtle reasons why they may or may not be useful, including:

- Is the language you're using dynamic? Large refactors in Ruby are much harder than in Java, since the compiler can't catch dumb mistakes

- What is the likelihood that you're going to get bad/invalid inputs to your functions? Does the data come from an internal source? The outside world?

- What is the core business logic that your customers find the most value in / constantly execute? Error tolerances across a large project are not uniform, and you should focus the highest quality testing on the most critical parts of your application

- Test coverage != good testing. I can write 100% test coverage that doesn't really test anything other than physically executing the lines of code. Focus on testing for errors that may occur in the real world, edge cases, things that might break when another system is refactored, etc.

I now tend to focus on a black box logic coverage approach to tests, rather than a white box "have I covered every line of code" approach. I focus on things like format specifications, or component contract definitions/behaviour.

For lexer and parser tests, I tend to focus on the EBNF grammar. Do I have lexer test coverage for each symbol in a given EBNF, accepting duplicate token coverage across different EBNF symbol tests? Do I have parser tests for each valid path through the symbol? For error handling/recovery, do I have a test for a token in a symbol being missing (one per missing symbol)?

For equation/algorithm testing, do I have a test case for each value domain. For numbers: zero, negative number, positive number, min, max, values that yield the min/max representable output (and one above/below this to overflow).

I tend to organize tests in a hierarchy, so the tests higher up only focus on the relevant details, while the ones lower down focus on the variations they can have. For example, for a lexer I will test the different cases for a given token (e.g. '1e8' and '1E8' for a double token), then for the parser I only need to test a single double token format/variant as I know that the lexer handles the different variants correctly. Then, I can do a similar thing in the processing stages, ignoring the error handling/recovery cases that yield the same parse tree as the valid cases.

I think you missed an important one, which is: how much do bugs even matter?

A bug can be critical (literally life-threatening) or unnoticeable. And this includes the response to the bug and what it takes. When I write code for myself I tend to put a lot of checks and crash states rather than tests because if I'm running it and something unexpected happens, I can easily fix it up and run it again. That doesn't work as well for automated systems.

You should understand when those tests are low effort: Look for other frameworks that help you to develop those tests easier or frameworks that remove that requirement for you. I.e. Lambok for generation of getters/setters. You only have to unit test code that you wrote.

High test coverage comes from a history of writting tests there. Sadly people include feature and functional tests in the coverage.

There's an easier answer and that is - as an experienced programmer - don't write any tests for your 'toy project' - at least not at the start.

The missing bit in the discussion is 1) churn, and 2) a devs ability to write fairly clean code.

Early stage and 'toy' projects may change a lot, in fundamental ways. There maybe total re-writes as you decide to change out technologies.

During this phase, it's pointless to try to 'harden' anything because you're not sure what it's entirely supposed to do, other than at a high level.

Trying Amazon Dynamo DB, only to find a couple weeks in that it's not what you need ... means it probably wouldn't make sense to run it through the gamut of tests.

Only once you've really settled on an approach, and you start to see the bits of code that look like they're not going to get tossed, does it make sense to start running tests.

Of course the caveat is that you'll need to have enough coding experience to move through the material quickly, in that, no single bit of code is a challenge, it's just 'getting it on the screen' takes some labour. The experience of 'having done it already many times' means you know it's 'roughly going to work'.

I usually try to 'get something working' before I think too hard about testing, otherwise you 3x the amount of work you have to do, most of which may be thrown out or refactored.

Maybe another way of saying it, is if a dev can code to '80% accuracy' - well, that's all you need at the start. You just want the 'main pieces to work together'. Once it starts to take shape, you've got to get much higher than that, testing is the way to do that.

This is the approach I take as well, and also think about it in terms of “setting things in stone”.

When you’re starting out a project and “discovering” the structure of it, it makes very little sense to lock things in place, especially when manual testing is inexpensive.

Once you have more confidence in your structure as it grows you can start hardening it, reducing the amount of manual testing you do along the way.

People that have hard and fast rules around testing don’t appreciate the lifecycle of a project. Different times call for different approaches, and there are always trade offs. This is the art of software.

I agree with all your points. Have you looked at any strongly typed functional language from ML like Ocaml, F#, Rust, or say similar like Haskell?

If you do make a slight tweak somewhere, the compiler will tell you there’s something broken in obscure place X that you would find out at runtime say with Ruby or Python.

THATS the winning formula. I’ve written so many tests for Python ensuring a function’s arguments are validated rather than the core logic/process of it.

> THATS the winning formula.

Not so fast. For some problems it's great, for other ones it's not.

Have you tried writing numeric or machine leaning core in Haskell? You'll notice that the type system just doesn't help you enforce correctness. Have you tried writing low level IO? The logic is too complex to capture on types, if you try to use them you'll have a huge problem.

> Have you tried writing low level IO? The logic is too complex to capture on types, if you try to use them you'll have a huge problem.

Rust's got a very Haskell-like type system, but it's a systems programming language. People are literally writing kernels in it. I think this is a pure-functional-is-a-bad-way-to-do-real-time-I/O thing, not a typing thing.

While this is true in some senses, Rust's type system is very different than Haskell's when it comes to handling IO.

That said, I don't think it's impossible to type IO. https://lexi-lambda.github.io/blog/2020/01/19/no-dynamic-typ... isn't the same problem, but it's related.

Hum... Pure functional is a bad way to do real time I/O, but my point was about types.

If you try to verify the kind of state machines that low level I/O normally use with Haskell-like types, you will gain a huge amount of complexity and probably end with more bugs than without.

Low-level I/O doesn't seem to have that much complexity, unless you're trying to handle all of the engineers' levels of abstraction at once.

Let's say you're writing a /dev/console driver for an RS-232 connection. Trying to represent "ring indicator", "parity failure", "invalid UTF-8 sequence", "keyboard interrupt", "hup" and "buffer full" at the same level in the type system will fail abysmally, but that's not a sensible way of doing it.

I could definitely implement this while leveraging the power of Rust's type system – Haskell would be a stretch, but only because it's side-effect free and I/O is pretty much all side-effects.

I have only done a little bit but I know exactly what you're talking about and it's great.

Really give it a go! It is beyond worldly. If you think Typescript is great, then ocaml/f# will make it look inferior.

If you're doing React + Typescript give Reasonml which is a syntax sugar on top of Ocaml that compiles using bucklescript a go. Ocaml has the fastest compiler out there.

[0] https://reasonml.github.io/

You could always go even further to the FP darkside and join the Purescript community >:)

How’s the tooling for that? Haskell has the “best” compiler and garbage tooling that should be built on top of the ol’ rolls Royce engine it’s rocking on.

Meanwhile the plugins and IDE integrations for Reason/Ocaml and F# are ready to go from the start and work pretty well.

Just a data point, with my current team, everyone jumped right in and wrote code and wrote tests from the start. The tests were integration tests that depended on the test database. Worked great at first, but then tests started failing sporadically as it grew. Turning off parallelism helped a bit, but not entirely. Stories starting taking longer too, where features entailed broad changes - it felt like every story was leading to merge conflicts and interdependency, where one person didn't want to implement their fix until someone else finished something that would change the code they were going to work on.

So then I came along and said, "hey, why don't we have any unit testing?" and it turns out because it was pretty impossible to write unit tests with our code. So I refactored some code and gave a presentation on writing testable code - how the point of unit testing isn't just to have lots of unit tests, how it's more that it encourages writing testable code, and that the point of having testable code means that your codebase is then easier to change quickly.

I even showed a simple demonstration based off of four boolean parameters and some simple business logic, showing that if it were one function, you'd have to write 16 tests to test it exhaustively, but if you refactored and used mocking, you'd only have to write 12. That surprised people. Through that we reinforced some simple guidelines of how we'd like to separate our code, focusing on pure functions when possible, making layers mockable. We don't even have a need for a complicated dependency injection framework as long as we reduce the # of dependencies per layer.

Since that time we've separated our test suite into integration tests and unit tests, with instructions to rewrite integration tests to unit tests if possible. (Some integration tests are worthwhile, but most were just because unit tests were hard at that time.) We turned parallelism back on for the unit test suite. The unit tests aren't flaky, and now people are running the unit test suite in an infinite loop in their IDE. Over that time our codebase has gotten better structured, we have less interdependence and merge conflicts, morale has improved, velocity has gone up.

Anyway, according to this article it sounds like we've done basically the opposite of what we should have done.

Do you have a link to your presentation?

Sorry, nothing that's so good for the general public, but the general gist is that the goal for a test is something that is simultaneously small, fast, and reliable.

And that by following those three principles, it kind of drives you to writing testable code. Because if you don't, you might have tests that are only small (simple integration tests), or only fast and reliable (testing unfactored code with lots of mocking) - and that the only way to do all three is by refactoring to write testable code that has good layer separation and therefore minimal mocking requirements.

There was stuff in there about how mutable state and concurrency leads to non-determinism and therefore unreliable tests, which is part of what justifies pushing towards pure functions that can be easily unit tested without mocking.

> because I spend half of my time writing tests.

Only half your time? You're doing testing wrong if it doesn't take 80% of the time ;-)

I have a love hate relationship with testing. Working for myself as a company of one, some of the benefits testing bring just don't apply. I have a suite of programs built in the style of your point (1). The programs were quick to market and hacked out whilst savings ran out not knowing if I would make a single sale.

Sales came, customer requests came, new features were wanted, sales were promised "if the program could just do xyz". More things was hacked on. The promise of "I will go back and do this properly and tidy up this god unholy mess of code" slowly slipped away that I stopped lying to myself I would do it.

Yes there was a phase of fix one problem add another, but I have most of that in my head now and has been a long time since that happened.

Not a single test. Developing the programs was "fun" and exciting. Getting requests for features in the morning and having the build ready by lunch kept customers happy.

Now I am redoing the apps as a web app for "reasons". This time am doing it properly, testing from the start. I know exactly what the program should do and how to do it, unlike the first time when I really had no idea. But still, I Come to a point and realise the design is wrong and I hadn't taking something into consideration. Changing the code isn't so bad, changing the tests, O.M.G.

I am so fed up of the project, I do all I can to avoid it, it is 2 years late, I wish I never started it. The codebase has excellent testing, mocks, no little hacks, engineering wise am proud of it. The tests have found little edge cases that would have been found out by customers so avoided that. But there is no fun in it. No excitement. Is just a constant drudging slog.

Am trying to avoid dismissing testing all together, as I really want to see the benefit of it in a production substantially code base. If I ever get there. At the moment, the code base is the best tested unused software ever written IMO

Well, then stop! Delete all the tests right now and do it however you want to do it.

The thing about testing that never really gets talked about it is, what's the penalty for regressions? What's the consequences if you ship a bug so bad the whole system stops working?

Well, if you're building a thing that's doing hundreds of millions in revenue, that might be a big deal. But you? You're a team of one! You rollback that bad deploy and basically no one cares!

Your customers certainly don't care if you ship bugs. If it was something important enough where they REALLY cared, they wouldn't be using a company of one person.

So, go for it. Dismiss tests until you get to a point where you fear deploying because of the consequences. Then add the bare minimum of e2e tests you need to get rid of that fear, and keep shipping.

There is another cost, if you try and fix a bug and break something else. If your codebase becomes so brittle that you feel like you can't do anything without breaking something else, that makes it unbearable to keep going with that project.

Having said all that, I find that it's better to avoid doing some unit tests when building your own project. It can be better to do the high level tests (some integration, focused on system) to make sure the major functionality works. In many cases, for an app that's not too complicated, you can just have a rough manual test plan. Then move to automated tests later on if the app gets popular, or the manual testing becomes too cumbersome.

It's still good to have a few unit tests for some tricky functions that do complicated things so you aren't spending hours debugging a simple typo.

Sure. My point wasn't really whether to write unit tests or not. It's more, do what works for you / your team to enable you to ship consistently. For the OP, spending all of their time writing tests clearly isn't working for them if they haven't shipped at all.

> Well, if you're building a thing that's doing hundreds of millions in revenue, that might be a big deal. But you? You're a team of one! You rollback that bad deploy and basically no one cares!

Human lives, customer faith in product, GDPR violations, HPPA violations, data, time/resources in space missions



> But you? You're a team of one! You rollback that bad deploy and basically no one cares!

I somehow doubt that comparing this 'team of one project' to the Mars Climate Orbiter leads to any useful conclusions. It's a nice bit of hyperbole though!

Rollbacks can create data loss. Also, rollbacks are not always a viable option.

Anyways..this was to address the issue of a bug. I took the comment of "it's just a team of one" as a way of trying to justify not putting your engineering due diligence into delivering a product to the customer.

> Rollbacks can create data loss. Also, rollbacks are not always a viable option.

I've delivered a number of products (in the early days of my career) to clients where data loss happened and while not fun, it also didn't significantly harm the product or piss off said client. I saw my responsibility primarily to do the best I could and clearly communicate potential risks to the client.

> I took the comment of "it's just a team of one" as a way of trying to justify not putting your engineering due diligence into delivering a product to the customer.

That I do agree with, but 'due diligence' is a very vague concept. I guess honest communication about the consequence of various choices is perhaps the core aspect?

And of course 'engineering due diligence', in my opinion, includes making choices that might lead to an inferior result from a 'purely' engineering perspective.

> not putting your engineering due diligence into delivering a product to the customer.

Yes. This is exactly what this person should do. Stop worrying about arbitrary rules and just deliver the damn product already. A hacky, shitty, unfinished product in your customer's hands that can be iterated on beats one that never got shipped at all every day of the week.

I don't understand your point. We're specifically talking about a one person company.

LOL. I guess I was being a bit conservative with that estimate!

I've worked for myself as well and know what you mean. In my situation, I was able to save myself from testing by telling my customers "this is a prototype so expect some issues".

My observation around codebases that weren't written with/for unit tests is that they always end up being a monolith that you have to run all of in order to run any of. Having decent code coverage means that it's at least possible to run just that one function that fails on the second Tuesday of the month when that one customer in Albania logs in.

Your points are fine, but I do not see how they apply to the blog post.

Overall, the blog post says, unit tests take a long time to write compared to the value they bring - instead (or also) focus on more valuable automated integration tests / e2e tests because it is much easier than it was 10-20 years ago.

My point is that OP is in step 1 of 5. It's not to say there aren't any good thoughts there, but the overall diatribe comes from a place of inexperience so take their advice with a grain of salt.

I don't think OP is step 1. OP is not arguing against testing, although the title could lead one into thinking that. OP is arguing for better, more reasonable testing.

OP appears to be arguing what you call step 5 of 5. They're not even saying you should never unit test, only that it should be avoided where it doesn't make sense, and that this happens more often than step-3 people like to think. Furthermore, the main direction of the article is that it's arguing for integration testing as a viable replacement for unit testing in a lot of situations, which doesn't relate to your overall point at all.

the comment is relatable to a testing mindset progression, which is relevant.

Your comment on the other hand, less so...

Step 5 touches on what I like to call "engineering judgment".

One of the things that distinguishes great engineers is that they make good judgment calls about how to apply technology or which direction to proceed. They understand pragmatism and balance. They understand not to get infatuated with new technologies but not to close their minds to them either. They understand not to dogmatically apply rules and best practices but not to undervalue them either. They understand the context of their decisions, for example sometimes code quality is more important and other times getting it built and shipped is more important.

As in life, good and bad decisions can be the key determiner of where you end up. You can employ a department full of skilled coders and make a few wrong decisions and your project could still end up a failure.

Some people never develop good engineering judgment. They always see questions as black and white, or they can't let go of chasing silver bullet solutions, etc.

Anyway, it's one thing to understand how to do unit tests. It's another thing to understand why you'd use them, what you can and can't get out of them, what the costs are, and take into account all that to make good decisions about how/where to use them.

This. These days I write unit tests only for functions whose mechanism is not immediately clear. The tests serve as documentation, specification of corner cases, and assurance for me that the mechanism does what it was intended to do.

I keep tests together with the code, because of their documentation/specification value.

I do not write tests for functions which are compositions of library functions. I do not test pre/post-conditions (these are something different).

And I definitely do not try to have "100% test coverage".

Spot on.

Personally, I fast tracked through 2-4 out of sheer laziness but that's definitely my progression in regards to testing and pretty much everything related to code quality. It includes comments, abstraction, purity, etc...

More generally:

- Initially, you are victim of the Dunning–Kruger effect, standing proudly on top of Mount Stupid. You think you can do better than the pros by not wasting time on "useless stuff".

- Obviously, that's a fail. You realize the pros may have a good reason for working the way they do. So you start reading books (or whatever your favorite learning material is), and blindly follow what's written. It fixes your problems and replace them with other problems.

- After another round of failure, you start to understand the reasoning behind the things written in the books. Now, you know to apply them when they are relevant, and become a pro yourself.

Sounds like my life story ;).

One thing I do religiously all the time is putting asserts everywhere. It's the only thing you can go crazy on. The rest is indeed always a balancing act.

Hear, hear.

Unit tests are not a goal, they are a tool. Striving for 100% test coverage is nonsense, not testing your software at all levels is bad. Middle ground and moderation are where it's at, not a black vs white choice. Just like every other tool you should understand it's strengths and weaknesses and you should apply it properly, not dogmatically or it will bite you.

I read this complementary one the other day, and one thing that is readily apparent to me is that a lot of people have a lot of different opinions about testing (move to system tests! more unit tests! regression tests!), but many are not asking the zillion dollar question:

What are you testing for?

This is critical because it basically gives you immediately what you should and should not test, and how. While mindless, dogmatic, metric oriented testing is a waste, testing with higher intent and purpose is extremely useful.

An example: test that something working on current vX also works on vA to vW, and when vZ is out, have the answer readily. Or that a biz feature fulfills the requirements. Or that someone not as well versed on intricate details of your piece of ownership will be confident in that piece still working after a simple fix when you’re on vacation. It can be one, some, but probably not all.

With that in mind, what to test, what doesn’t make sense to test, and what to test against becomes more clear: should I mock this? or should I run it against some staging environment? Should I perform (yikes? not!) manual testing?

The answers are highly dependent on the piece of code being tested.

Tests are here to help you answer a question, if you aren’t sure what the question is then your tests will miss the point.


I feel like a lot of unit testing is just another form of bikeshedding. It's easily understood, everyone can talk about it, and you can spend a lot of time on it with no clear goal but feel like you're getting something done.

> "Striving for 100% test coverage is nonsense"

100% coverage of what exactly? Tests that go through all your lines of code without testing any of the logic, is useless. If you want to be thorough, you need to do mutation testing, which is a system that tests the quality your unit tests by mutating your logic (changing a > for >=, a + for -, etc) and then expects at least one test to fail. If no test fails, that piece of logic wasn't tested.

Without that, it's entirely possible your high code coverage doesn't actually test anything meaningful. Also, this sort of logic is exactly the kind of stuff you want to unit test. All the standard plumbing boilerplate code is not something that needs to be unit tested. The logic does.

and it's not just that, for example branch coverage only says that you branched but not how, a complex branch might need more than one test to be fully, or decently, tested.

Right, essentially you have to be aware that you are testing the state space the code might end up in, which is very different from just hitting every line of code or every branch.

On that note it is a great tool during development to get a piece working without connecting it to the broader application. Test driven development gives a nice debugging context that is easier to work with. The code coverage and regression part comes as a nice bonus feature.

I’d also like to add that if you contribute code to an open source project it is extremely beneficial to have iron-clad unit tests. Since there is so many devs it would be easy for someone to accidentally break something you fixed already.

The benefit of 100% test coverage is that there are no more discussions on what to test and what not. When in doubt, test it. In larger groups of developers there are otherwise ongoing discussions if A needs to be tested or not. I have seen culture wars around this, from people who don't want to test and are in the eyes of others always testing not enough and vice versa. Especially with a diverse development force with different ages, seniority and cultural background.

It's often easier to just aim for 100% test coverage instead (with excluding some categories of files).

EDIT: I would not and did not start with 100% unit testing. But if there are ongoing culture wars and discussions didn't lead to a workable compromise, 100% test coverage worked for me and after some days test coverage was a non issue.

> just aim for 100% test coverage instead (with excluding some categories of files).

That's where the 'gaming' comes in.

The tests start just going through lines without hitting a single expect statement.

The ignore files start becoming battlegrounds in the PRs because people just exclude half the damn project.

We just have a simple rule... if you wrote code, you have to write coverage for it. If it breaks and your test doesn't catch the breakage, the bug fix goes back to you. Some people will ask "but what about what I'm working on now", you'll have to communicate that you feel your previous work was far more important.

> the bug fix goes back to you. Some people will ask "but what about what I'm working on now", you'll have to communicate that you feel your previous work was far more important.

this feels punitive, especially in the eyes of management. unless you're in a safety critical area where fully testing every code path is a hard requirement, people will eventually write bugs.

i'd rather work somewhere that recognizes defects occur and has a fast iterative process to push out new changes rather than one based on shame for having written a bug.

> has a fast iterative process to push out new changes rather than one based on shame for having written a bug.

That is the fastest most iterative process we have found so far... as the expert on the original code, you are able to deliver the best outcome.

You're not being shamed for writing a bug, you're being shamed for not testing your code.

Even with 100% coverage, that doesn't mean you've found every bug that can possibly be found with unit tests. Your tests could always cover more inputs, more situations, etc.

No, you can't find bugs that you can't think of.

Property based testing and random values in unit tests find lots of bugs you didn't think of though.

> you can't find bugs that you can't think of

Rather, you won't find bugs that you choose not to think of because you've let the "100%" number lull you into complacency, even though you know it's 100% of lines/branches, not 100% of inputs.


My problem in 40y of programming is still making bugs and those I make come from not thinking about edge cases or from wrong assumptions and not from being lulled into writing tests to meet a 100% number.

But personalities differ and if being lulled into security by writing towards a 100% number is a problem for you I would be careful, I totally agree here.

I agree that fuzz testing+linting can help you. However from my persective lower level tests help you to build trust about the software you're releasing.

100% test coverage is pretty easy to game as long as the metric is known. Just exercise all paths and write as few assertions as you can.

I'd prefer <100% coverage plus discussions about what to test (and how) much more than working with a test suite built on the wrong incentives.

If you are someone who games metrics for his benefit or hire people who are gaming metrics for their benefit, I assume yes, this metric is very easy to game as are most metrics. Metric systems are not cheater proof.

This argument, that it avoids discussions, does make sense but only if you're in a team where discussions tend to be a waste of time.

When my colleagues are knowledgable and open minded I would embrace every opportunity to have a good discussion.

I like thinking about the trade off between a simpler rule that is mostly right vs. decisions require judgement and consensus. I think the simpler rule is usually the better side of the trade-off.

But in this case I think the cure might be worse than the disease. Tests for plumbing code often end up being brittle tests of methods getting called on mocks in the right order. People will notice that these require a lot of toil to keep them running as code changes while providing very little benefit in avoiding mistakes. People will rankle at being told that they must write these tests, which they can see are a waste of time.

I've done it both ways. I'm much happier with my work when I'm not trying to write tests that are tedious and don't seem to provide any value, in order to hit an arbitrary coverage metric. I suspect my teammates feel the same way, so on teams where I have input into the decision on this, I do not advocate for 100% coverage. It does make it harder to have the discussion of which tests should and shouldn't be written, but I think it's worth that cost.

Yes I have seen bad unit tests very often.

Writing good testing code is harder than writing business code. Especially junior developers struggle with this, most often because many companies write not enough tests to learn writing good tests.

And if you're in an environment, where this is a non-issue I think thats great. Don't fix something that doesn't need to be fixed.

Yup. The debate over unit testing, being political, is a far bigger impediment to progress than the actual tests, which are a technical hurdle. It's essentially the same reason for linters and auto-formatters.

It has a side benefit that it forces devs to write testable code, which inclines them to reasonbly factored code.

> It's often easier to just aim for 100% test coverage instead (with excluding some categories of files).

Congratulations, now you have a war over which categories of files are excluded from the "100% test coverage" rule. ;)

Thank you!

"larger group of developers" is the phrase that caught my attention. Humans don't scale well. This is where microservices do become attractive. This service is owned by a small team, and that team makes these types of judgements. It may be very different from how other service is owned and maintained, and that's ok.

From my limited experience you get cross team discussions about unit testing, especially if one microservice has too many bugs in the eyes of other teams giving development a bad reputation or making working with a microservice hard. Especially if it breaks with releases and other teams get paged.

Yes. you move the people don't scale well issues to multiple teams don't scale well

Rule by diktat is a recipe for demotivated staff.

Ongoing culture wars are a recipe for demotivated staff.

Perhaps I am wrong, and I would not start with a 'diktat' for obvious reasons.

As a manager, you didn't have discussions about the level of necessary code coverage? Would be interested on how you managed unit testing without 'dictat'. How it would fit into integration testing and explorative testing. What level did developers in your deparment usually find "adequat" ? If you considered it too low, how did you raise test coverage as a manager without defining a coverage level?

This is dangerously close to not understaning what you are doing. _Thinking_ what to test has a benefit of making you think, not just cargo-culting.

Obviously you need to think about how to test. Usually this leads to finding omissions in your code adding edge cases.

Exactly, the 80-20 rule also applies to unit tests. I don't have 100% coverage on big projects, but everything I write that's meant to go in production is in TDD anyway, so there's always enough tests to prevent a junior from breaking my stuff, and I save a lot of time because thanks to TDD I don't need to manually test much, and most of the times not at all, I just wait for user feedback: it's always much easier to have code working in real conditions, if it already works in test conditions, not the other way around.

there is real world research that actually shows that the 20/80 works in unit tests, the last 20% hardly catches any issues or contribute very little to quality

And like other tools their importance is part of the entire tool-set. In many shops tight schedules, management by Product managers or people who are too removed from code cause you compromise every other principle of responsible sane coding. When this happens, unit tests are your only shield from doom. If everybody knows and allowed to write sane, good code with reasonable time to build it, the unit tests are nice to have but not a must

Yeah, but those QA dashboards monitored by management....

Actually it is pretty common to have 100% coverage with some extra redundancy too (where some things get accidentally covered multiple times). Striving for 100% is indeed nonsense, but having 100% coverage is usually accidental in clean code that you want to work, and merely a by-product of TDD.

I strive for working code. Sometimes I miss something in the TDD cycle and don’t have 100% and it is that which usually comes back to bite you.

I have never found 100% test coverage has bitten me, dogmatic or otherwise.

That heavily depends on the size of your codebase and perhaps also the language you are writing in. Writing in C++, for example, I often have switch statements in the form of:

  switch(type) {
  case X:
  case Y:
    throw InternalException("Unsupported type!");
Now if all goes well the default case will never be covered. At some point I thought "why have this code if it's not supposed to run; let's rewrite this so we can get 100% code coverage!", and I ended up with the following code:

  switch(type) {
  case X:
    assert(type == Y); 
Now we can get 100% code coverage... except the code is much worse. Instead of an easy-to-track down exception we now trigger either an assertion (debug) or weird undefined behaviour (release) when the "not supposed to happen" inevitably does happen because of e.g. new types being added that were not handled before.

Is worse code worth getting 100% code coverage? In my eyes, absolutely not. I think good code + testing should be able to reach at least 90% typically, likely 95%, but 100% is often not possible without artificially forcing it and messing up your code and/or making it much harder to change your code later on.

Why can't you have a test that checks if the correct exception type is thrown on invalid input? The exception is part of your API too.

This behavior occurs in internal functions and is not triggerable by the user. The only way to trigger this behavior would be to create unit tests that test small internal functions by feeding them specifically invalid input. This is possible, but I would argue this falls under "dogmatically trying to reach 100% code coverage". Testing small internal functions adds very little value and is detrimental to a changing codebase. After adding these tests every single change you make to internals will result in you needing to hunt down all these tiny tests, which adds a big barrier to changes for basically no pay-off (besides a shiny "100%" badge on Github, of course).

As always, I think the answer here is more along the lines of "it depends." It's not that uncommon of a task to make an existing function more performant, and a well thought out test suite makes that leaps and bounds easier even for small, internal functions.

It's arguable that this is a programming bug an not really recoverable, so throwing doesn't make much sense.

You can be defensive to various degrees about assertions:

1. You can just use assert() to fail in Debug and do nothing in Release. 2. You can be more defensive and define your always_assert() to fail in Release as well. 3. You can double down on the UB with hints to the compiler and provide assume(), which explicitly compiles to UB when it's triggered in Release (using __builtin_unreachable() for example).

About the organization of the if statement: I agree that the former is better, I would use assert(false) though.

Indeed it is a programming bug - but programming bugs happen. In my experience writing programs as if bugs will not happen is typically a bad idea :)

Throwing an exception here is basically free (just another switch case) and gives the user a semi-descriptive error message. When they then report that error message I can immediately find out what went wrong. Contrasting with a report about a segfault (with maybe a stacktrace), the former is significantly easier to debug and reason about.

assert_always would provide a similar report, of course. However, as we are writing a library, crashing is much worse than throwing an internal error. At worst an internal error means our library is no longer usable, whereas a crash means the host program goes down with it.

__builtin_unreachable() is your friend.

Better yet, omit that default case, so that in the future when you do add a new value to the enum, the compiler will warn you and force you to add a new case.

But I agree with your general thesis that it's just not worth getting to 100% coverage.

If it didn't catch any bugs, either during initial development or through later changes, then it bit you via wasting your time. I don't think it's fair to say that having tests necessarily makes the code under test any cleaner.

> Striving for 100% test coverage is nonsense

Tell that to SQLite guy.

Striving for very high test coverage seems like exactly the kind of thing you want for your database infrastructure.

And that's exactly why you need less tests in your project that uses a DB: there's no need to test the DB because it's covered by its own tests already.

Interactions with your DB are often the most fragile piece of your code, because there's an impedance mismatch between the language you're writing and SQL. Some languages/frameworks abstract this more safely than others, though.

Interaction, in general, is where many, if not most, errors lie. Unit testing verifies that things work in isolation. But if you code up your unit tests for two components that interact, but with different assumptions, then the unit tests alone won't do you any good: X generates a map like `%{name => string()}`, Y assumes X generates a map like `%{username => string()}`. Now, hopefully that won't happen if you're writing both parts at the same time, but things like this can happen. Now your unit tests pass because they're based on the assumptions of their respective unit of code, but put it together and boom!

Exactly, though I believe there's still a thin line between testing the interaction, and testing the db itself. Just like, the difference between testing some code, and testing the language itself.

You should definitely watch this presentation on SQLite exploits.

SELECT code execution from using SQlite - DEF CON 27 Conference


Don't get me wrong, it's a great product and I use it often, but 100% test coverage does NOT equal 100% safe.

Something like sqlite can actually be unit tested to 95% or so. If you stretch the definition of 'unit test' a bit (don't mock i/o).

Try that with a networked application that takes user input though...

Mocking networked services is something very much worth doing because you can then check you’ve set timeouts correctly, handling incomplete or junk responses gracefully etc. Those are the kind of hidden problems that can bite you on production deployments.

You can mock networks all you want and you'll still cover only what you can think of.

It will always break on the user's internet, because it's too diverse to predict.

Doesn't mean you can't have some networking unit tests, just that you shouldn't believe in them too much.

Edit: you said services. You thinking of server? I'm thinking of clients.

"mocking networked services" is exactly what you would do when testing clients.

And it doesn't have to be a static mock. It's not too hard to inject a fuzzer in your mock service response, although that's probably left to a separate testing routine, and not part of your unit test setup. But if you have no mock for your network service, you can't fuzz it either.

And yet they had bugs and could lose data.

Which shows 100% unit test coverage is not better than spending that time in other kinds of tests.

unit tests don't account for timing-based effects and you generally can't test for ACID properties.


SQLite is suitable for 100% coverage.

A lot of application code or workflow style code is hard to reach 100% coverage as they are rarely triggered.

The rarely triggered code is exactly where you want unit tests.

That's an opportunity to describe, in code, what that path is supposed to do, and then make sure it does it.

Perhaps the problem isn't mocking dependencies, but trying to hide the fact that GetSolarTimesAsync needs two pieces of data to work: a date and a location.

But the original signature is just this:

    public async Task<SolarTimes> GetSolarTimesAsync(DateTimeOffset date)
That introduces a lot of complexity:

* The SolarCalculator needs to be able to work out its own location, so it needs a LocationProvider

* SolarCalculator needs to be IDiposable since it owns a LocationProvider

* The SolarCalculator will need more methods if it ever needs to calculate the times in a different location

* If fetching the location is slow, but the application needs to calculate times for multiple dates (eg to build up a table of times), then the SolarCalculator will need an method that takes in an array of dates to be efficient

But all that could be solved by making the function take all of the arguments it needs to return its value:

    public SolarTimes GetSolarTimes(DateTimeOffset date, Location location)
No location provider needed, no IDiposable, just one efficient stand-alone method.

Unit testing this is now just:

    var calculator = new SolarCalculator();
    var actual = calculator.GetSolarTimes(new Date(...), new Location(...));
    var expected = new SolarTimes(...);
...so, perhaps the issue isn't that unit testing is a bad idea, but that code which is hard to use in a unit test might also be hard to use in a wider application? And perhaps the fix is to make the code easier to use?

Completely agree. The author is trying to blame tests for code complexity. GetSolarTimes() is a simple function evaluating an equation - treat it as such.

If your code is broken down clearly into logic and plumbing, unit testing the logic becomes super easy. It allows you to construct software using blocks you have absolute confidence in. Unit testing plumbing is harder, and that's when integration testing shines.

I agree with you and that‘s most times also my observation: Unit tests tend to show how decoupled and re-usable your code is. Is a function gets hard to test with a unit test, this points usually towards an architectural issue.

if i got a dollar every time i started testing a piece of code, realized that it was waaaay to complex, refactored the code so the tests were easier to write/understand... i would have a lot of dollars.

100% of the time, it was the right idea and the code became a lot better.


The author's tests are overly complex. Instead of gleaning the actual value of this insight, which is that you're not cleanly separating your inputs and your outputs, the author concludes that unit tests are a waste of time.

Nope. Unit tests are a tool, but writing proper unit tests and understanding the value they give you is an art and a science. It requires experience and deliberate design.

Exactly! Thank you!

It seems that most devs (me included) learn at school to write pure functions, which is great. Then they come to the industry and all of the sudden the "parseXml" function takes a ftp port as a parameter... ("be in my case the xml was on a ftp server!")

Why there is no CS course that explains this kind of stuff?

Mentioned elsewhere but that is largely the heart of Functional Core, Imperative Shell: https://www.destroyallsoftware.com/screencasts/catalog/funct...

(And I am sure a bunch of other similar but differently-named concepts)

I thought about this too when I read the article but then I thought that it might not always be possible to rewrite the code like that in other cases. Maybe the example used in this article was just not the best example.

Didn't expect to scroll this far to find this.

TL;DR: GetSolarTimes(Location, Date) is a unit-testable function.

Had some thought been put into writing with unit tests, there would be no problems with that example.

that feels pretty dismissive / reductive to me...

seems like you're all saying that

  var actual = calculator.GetSolarTimes(new Date(...), new Location(...)); 
is JUST SIMPLER and better than

  public async Task<SolarTimes> GetSolarTimesAsync(DateTimeOffset date) 
with an internal location provider as a dependency because it's easier to test.

but i think that ignores the reason why DI containers were invented in the first place and assumes that the solar calculator is just a simple entrypoint-type application, rather than being a component in real application. You might have 20 layers of THING, somewhere inside which, this solar time calculator lives and is used... and you still have to get Location from SOMEWHERE to pass it into the calculator.

so what happens when Whatever uses the location provider to get the location and pass it along needs to be tested? and through how many layers of stack do you need to pass Location before you realize that every test of every intermediate layer needs to know about location, but only for the purpose of passing it along?

I think it's a more nuanced case than you're making it seem. Beyond some level of complexity in an application, it becomes simpler to co-locate dependencies where they're actually used.

The author puts forward exactly same point, somewhat closer to the conclusion of the article:

  public SolarTimes GetSolarTimes(Location location, DateTimeOffset date)

I disagree with the notion that making your code testable in isolation serves no other purpose than to write unit tests. It very specifically forces you to think about how and why each piece of code is coupled with other code, and generally requires you to make this coupling as loose as possible, to make testing in isolation possible. Loosely coupled code is also easier to reason about and easier to refactor. So testing doesn't just provide you with the value of tests, it also nudges you toward a saner architecture.

I strongly disagree for the reasons listed in the article; it induces the construction and testing of abstractions which exist solely to enable testing, and do not enable simpler reasoning.

Refactoring is even worse. Refactoring after you've split something up into multiple parts and tested their interfaces in isolation is far more work. Any refactoring worth a damn changes the boundaries of abstractions. I frequently find myself throwing away all the unit tests after a significant refactoring; only integration tests outside the blast radius of the refactoring survive.

> the construction and testing of abstractions which exist solely to enable testing

One of his examples from the article is injecting, IOC-style, the HttpClient instance into his LocationProvider class. He insists that this is a waste of time, and that the automated tests (if you have any at all), should be calling out to the remote service anyway. I can't disagree more! Hopefully you're configuring the automated tests to interact with a test/dev instance of the service and not the production instance (!). But what invariably happens is that the tests fail because the dev instance happened to be down when they ran. And they take a long time to run anyway, so everybody stops running them since they don't tell you anything useful anyway. This is even worse when the remote service is not a web service but a database: now you have to insert some rows before you run the test and then remember to delete them... and hopefully nobody else is running the same test at the same time! To be useful in any way, automated tests must be decoupled from external services, which means mocking, which means some level of IOC.

On the other hand, he also introduces the example of SolarCalculator mocking LocationProvider. I agree that that level of isolation is overkill and will unapologetically write my own SolarCalculator unit test to invoke a "real" LocationProvider with a mocked-out HttpClient, and I'll still call it a unit test. (On the other hand, the refactored designed with the ILocationProvider really is better anyway).

So I think the reason people argue about this is because they can't really agree on what constitutes a unit test. I'd rather step back from what is and isn't a unit test and focus on what I want out of a unit test: I want it to be fast, and I want it to be specific. If it fails, it failed because there's a problem with the code, and it should be very clear exactly what failed where. A bit of indirection to permit this is always worthwhile.

Maybe your unit's are too big. Unit tests are tricky because it's about coming to a personal and team agreement on what a 'unit' of functionality is.

I find the same issue in throwing away tests when I'm writing small scale integration tests with junit. Usually I'm mocking out the DB and a few web service calls. So those tests become more volatile because their surface is exposed more. But smaller level, function and class level tests can have a really good ROI and they do push you design for testing which makes everything a bit better imo.

It's normally the opposite. Unit tests are too small.

If you unit test all of the objects(Because their all public) then refactor the organisation of those objects then all your tests break. Since you've changed the way objects talk to each other, all your mock assumptions go out the window.

If you define a small public api of just a couple of entry points, which you unit test, you can change the organisation below the public api quite easily without breaking tests.

Where to define those public apis is a matter of skill working out what objects work well together as a cohesive unit.

The notion of a public API is really more fluid in the context of internal codebases as well. It's important to maintain your contract for forwards/backwards compatibility when publishing a library for a world. When you can reliably rewrite every single caller of a piece of code, you don't have that problem.

I usually test whatever subset of code could be tested with less than about a dozen of test cases. If it's larger then test logical parts of it with mocks in the leaves. For small projects it could be usually a single controller with only some mocks on the edge of the system (database, external APIs etc.). Refactoring the code where there is one test suite per class could be a nightmare.

> I frequently find myself throwing away all the unit tests after a significant refactoring

Good, this time you can get it right.

if it is worth rewriting the code it's worth rewriting the tests.

Seems to defeat the point of the tests. At least partially.

Unit tests test that the units do what they are supposed to do. Functional tests test that parts of the system do what it's supposed to do.

If you change the implementation for a unit, a small piece of code, then the unit test doesn't change; it continues to test that the unit does what it's supposed to do, regardless of the implementation.

If you change what the units are, like in a major refactor, then it makes sense that you would need whole new unit tests. If you have a unit test that makes sure your sort function works and you change the implementation of your sort, your unit test will help. If you change your system so that you no longer need a sort, then that unit test is no longer useful.

I don't see why the fact that a unit test is limited in scope as to what it tests makes it useless.

If a particular test never finds a bug in its lifetime (and isn't used as documentation either), you might as well as not have written it, and the time would be better spent on something else instead--like a new feature or a different test.

Of course, you don't know ahead of time exactly which tests will catch bugs. But given finite time, if one category of test has a higher chance of catching bugs per time spent writing it, you should spend more time writing that kind of test.

Getting back to unit tests: if they frequently need to be rewritten as part of refactoring before they ever catch a bug, the expected value of that kind of test becomes a fraction of what it would be otherwise. It tips the scales in favor of a higher-level test that would catch the same bugs without needing rewrites.

> If a particular test never finds a bug in its lifetime (and isn't used as documentation either), you might as well as not have written it

That's like saying you shouldn't have installed fire alarms because you didn't wind up having a fire. Also, tests can both 1) help you write the code initially and 2) give a sense of security that the code is not failing in certain ways.

> It tips the scales in favor of a higher-level test that would catch the same bugs without needing rewrites.

Writing higher level tests that catch the same bugs as smaller, more focused tests is harder, likely super-linearly harder. In my experience, you get far more value for your time by combining unit, functional, system, and integration tests; rather than sticking to one type because you think it's best.

My comment went on to say that you don't know ahead of time exactly which tests will prove useful. So you can't just skip writing them altogether. They key point is that if you have evidence ahead of time that a whole class of tests will be less useful than another class (because they will need several rewrites to catch a similar set of bugs) that fact should inform where you spend your time.

To go with the fire alarm analogy and exaggerate a little, it would work like this: you could attempt to install and maintain small disposable fire alarms in the refrigerator as well as every closet, drawer, and pillowcase. I'm not sure if these actually exist, but let's say they do. You then have to keep buying new ones since the internal batteries frequently run out. Or, you could deploy that type mainly in higher-value areas where they're particularly useful (near the stove), and otherwise put more time and money in complete room coverage from a few larger fire alarms that feature longer-lasting batteries. Given that you have an alarm for the bedroom as a whole, you absolutely shouldn't waste effort maintaining fire alarms in each pillowcase, and the reason is precisely that they won't ever be useful.

There are side benefits you mentioned to writing unit tests, of course, like helping you write the API initially. There are other ways to get a similar effect, though, and if those provide less benefit during refactoring but you still have to pay the cost of rewriting the tests, that also lowers their expected value.

To avoid misunderstanding, I also advocate a mixture of different types of tests. My comment is that based on the observation that unit tests depending on change-prone internal APIs tend to need more frequent rewrites, that fact should lower their expected value, and therefore affect how the mixture is allocated.

I get what you're saying and it makes sense to me.

> unit tests depending on change-prone internal APIs

This in particular is worth highlighting. I tend to now write unit tests for things that are getting data from one place and passing it another, unless the code is complex enough that I'm worried it might not work or will be hard to maintain. And generally, I try to break out the testable part to a separate function (so it's get data + manipulate (testable) + pass data).

Sorry, that should be "tend to not write".

I'm definitely a fan of higher level tests that frequently survive refactorings.

I'm not arguing unit tests are useless.

Not if you rewrite/change the tests first, since you know the code currently works and you are safe to refactor the tests. Equally you are safe to change the tests to define the new behaviour, and then follow on with changing the code to make it green.

The point was to change the code structure without changing the tests (possibly to enable a new feature or other change). The challenge being when the tests are at the wrong "level", probably by team policy IME. If you change the tests, how can you be sure the behavior really matches what it was before?

Agreed. I see tests as the double entry accounting of programming, they let you make changes with a lot more confidence that you're only changing what you want and not some unexpected thing.

They're not for catching unknown bugs, they're for safer updates.

You often end up reimplementing the "comsumer" module for your code in order to test. This is problematic because a) extra work and b) that fixture layer probably doesn't behave exactly like the real caller code and c) now you have to keep those two implementations in sync.

There shouldn't be anything particularly complex in test code, limiting the extent of any "reimplementation". Moreover, if the test client is different from real clients and not "in sync" it's a good thing: unit tests that do something differently within the limits of documented acceptable behaviour expose assumptions and bugs. For example, suppose outputs should be sorted in a certain way, but they are sorted if a client presents sorted inputs and not because they are actually checked and sorted: a test with random inputs can expose the hole.

Put another way, if your code is very difficult or complicated to unit test, you've probably abandoned best practices for the language you're writing in somewhere along the way, in the name of expediency.

in the name of expediency

or productivity.

but BeSt PrAcTiCeS

The exercise of writing debugable and testable code is often worth more than the unit tests.


Striving to make your code testable is almost always worth it. Someone might ask this guy to add some error handling to his code for example. :) Then he will find out that by writing code, however simple, that a "works on my machine" I.e. is proven to work in a single happy path context is painful to change. Writing code that runs in multiple contexts (composed as an app or decomposed for testing) is intrinsically more easy to work with and change.

> Loosely coupled code is also easier to reason about and easier to refactor.

Can you expand more on this? I think this is where the author would disagree.

E.g., how is the code easier to reason about or refactor having introduced a location service interface that has only one one implementation?

Suppose you have a unit that makes a couple of api calls to set itself up, then performs a computation, then does some sort of storage. Thinking about how you would test this might lead you to an inversion of control approach, and you might isolate side effects. The storage provider might get passed in, both to make it easy to mock, and to reduce coupling and surprise.

I meant expand on how those things make code easier to understand, not why unit testing causes you to adopt them.

Mainly it reduces surprise. If you call an interface with implicit dependencies, you won’t know if why it’s breaking without debugging and making sure it sets up its dependencies properly. If you call a testable interface with explicit dependencies, you can mock out those dependencies to debug parts individually.

Unit tests have a purpose, which is mostly to protect the programmer against future mistakes. Integration and system tests protect the user against current mistakes.

I've been on projects that focused almost exclusively on unit tests and on projects that focused almost exclusively on integration tests. The latter were far better at shipping actually working code, because most of the interesting problems occur at the boundaries between components. Testing each piece with layer after layer of mocks won't address those problems. Yay, module A always produces a correct number in pounds under all conditions. Yay, module B always does the right thing given a number in kilograms. Let's put them together and assume they work! Real life examples are seldom this obvious, but they're not far off. Also note that the prevalence of these integration bugs increases as the code becomes more properly modular and especially as it becomes distributed.

I firmly believe that integration tests with fault injection are better than unit tests with mocks for validating the current code. That doesn't mean one shouldn't write unit tests, but one should limit the time/effort spent refactoring or creating mocks for the sole purpose of supporting them. Otherwise, the time saved by fixing real problems more efficiently - a real benefit, I wouldn't deny - is outweighed by the time lost chasing phantoms.

What, no. That's exactly the opposite.

Unit tests protect you against current mistakes. They're tied to the exact implementation.

"Right now my function X should call Y on it's dependency Z before it calls A on it's dependency B. I know that my method should do this, because this is how I designed it now. Let me write a test and expect exactly that."

Integration and unit tests will tell you whether in the future your code will still work when you refactor.

"Okay, we rewrote the whole class containing the function. Does running my thing still end up writing ABC into that output file?"

Otherwise I agree with you mostly.

> They're tied to the exact implementation.

If unit tests are tied to an exact implementation, they''ll fail on correct behavior and that's definitely wrong. It shouldn't matter whether X calls Z:Y or B:A first, whether it calls them at all, whether it calls them multiple times, whether it calls them differently. All that matters is that it gets the correct answer and/or has the same final effect.

Unit tests should be based on a module's contract, not its implementation. This is in fact exactly what's wrong with most unit tests, that they over-specify what code (and all of its transitive dependencies) must do to pass, while by their nature leaving real problems at module interfaces out of scope.

a) Most code in the wild doesn't have an explicit output and instead is orchestration code.

b) Even if you have an output, it's dependent on more complex input of arbitrary types.

Assume that there's a method that returns an input based on summing the output of a method call of it's abstract dependencies.

To do dogmatically correct unit testing you'd pass those 2 mocked dependencies, and have those methods return the values when the right method is called on them.

Then you'd assert that B was called on A, that D was called on C, and that the method under test returns the sum of those returns.

As soon as you move into passing implementations of those 2 dependencies, to anyone dogmatic you're doing integration testing.

Even if the tester isn't being dogmatic, in a lot of cases these inputs are complex enough that building enough actual inputs that are consistent and realistic to cover all the cases is prohibitively costly, so they opt for mocks.

Now, suddenly you just have more code to maintain when making changes, but you feel good about yourself.

The interface on our object (O) that you are describing is:

    O -> int
Your unit test is concerned with narrowing the interface above to:

    O -> int // of specific value based on dependencies
If Os only dependencies are A and C, this can be rewritten to:

    A -> C -> int // of specific value 
Of course if we assume both A and C, themselves, have dependencies we can recursively rewrite the above until we have a very long interface, but instead you have opted to mock (M) them:

    M(A) -> M(C) -> int // of specific value 
You then take it a step further and mock the method calls on each to return a specific value:

    M(A) -> int
    M(B) -> int

    M(A) -> 3
    M(B) -> 5
Okay. Now we can rewrite our interface to:

    3 -> 5 -> int // of specific number
and our test to:

    3 -> 5 -> 8
and make our assertion that the result is indeed the sum of the inputs (not to mention the ridiculous assertions that specific methods were called within the implementation). Yikes... No wonder OOP gets a bad wrap. All that for what amounts to a `sum` function.

The designer of the above monstrosity could learn a lot from the phrase "imperative shell, functional core". It sounds like dogma until you are knee deep in trying to test the middle of a large object graph!

No, unit tests aren't tied to the current implementation, they're tied to the current interface. If your programming interface calls for multiple interdependent objects without central coordination, then yes, you should test that. But I would say that you've already started out with code that is too badly structured to allow for testing the units in isolation: you should be able to unit test A without relying on Z at all.

It's integration testing that validates that all your units still combine (integrate) into a working end product. That's not about testing your implementation nor your internal interfaces, that's about testing your program's inputs and outputs.

That's not correct.

All tests protect the programmer against future mistakes. All tests are a protection against regressions.

But yes agreed, integration tests absolutely carry much more value than any unit tests might. Specifically because units tests tend to target things that are essentially implementation details.

The only time I'd say unit tests carry any value is if they're testing some especially important piece of business logic e.g. some critical computation. Otherwise, integration tests rank the highest in the teams I lead.

One interesting thing that is easy to notice about all of the examples of in the article is that they are absolutely infested with objects.

I don't have anything against objects, per se, but I think they tend to make unit testing much more difficult to accomplish. The closer your code resembles pure functions, the easier it is to do dependency injection and unit testing.

There isn't pure functions the moment you touch any kind of IO.

Plus the same problem arises with modules instead of objects, which traditionally are even harder to customize.

> There isn't pure functions the moment you touch any kind of IO.

You can get pretty far with good abstractions and dependency injection. Go's io::Reader and io::Writer interfaces are a great example of this. The resulting functions aren't pure in a technical sense, but they're pretty easy to unit test none the less.

> Plus the same problem arises with modules instead of objects, which traditionally are even harder to customize.

Maybe you could elaborate. I really don't understand what you mean here.

From what I understand, modules just scope names, they don't maintain state. I don't see how they have the same problems as objects.

> You can get pretty far with good abstractions and dependency injection

Which goes back to the article's point of having to write code that is unit test friendly.

Now architecture decisions have to integrate interfaces that wouldn't be needed otherwise.

> Maybe you could elaborate. I really don't understand what you mean here.

Modules keep state via global variables, module private functions and the surface control that they might expose via public API for the module.

Additionally on languages that support them, they can be made available as binary only libraries.

> Which goes back to the article's point of having to write code that is unit test friendly.

> Now architecture decisions have to integrate interfaces that wouldn't be needed otherwise.

You're not wrong.

But in the context of functions, that doesn't seem to me to be particularly onerous. If the worst I'm forced to do is change the type of my parameters to an interface instead of a concrete type, that seems like a pretty small price to pay for easy testability. Certainly a much smaller price than the examples in the article.

You are assuming that interfaces exist as language concept.

Imagine doing unit tests for a C application, where modules == translation unit/static/dynamic library, thus you can only do black box testing.

Now one needs to clutter it with function pointers everywhere, or start faking interfaces with structs, just for the benefit of unit tests.

And with static/dynamic libraries than one might need to start injecting symbols into the linker to redirect calls into mocking functions.

All just to keep QA dashboards green.

That's how a lot of great C code is written anyway. A C library should abstract out logging, allocation, and IO so that the client code can change them out if need be.

The fact that it makes unit testing easier is just icing on the cake.

Having written tests for enterprise C code, I wouldn't call it a great experience, rather something I am glad not to ever repeat again.

Mainly due to the linking hacks and low level debugging sessions required to mock all necessary calls.

Plus that was just an example, there are plenty of languages with modules and binary libraries.

I mean, that's why great libraries don't make you do that. There's a lot of crap libraries.

The libraries dependencies should all be indirected through whatever context struct you pass to all your calls.

Agreed, and this goes back to the initial thread that just because a language is more focused on functions it doesn't make testing automatically better, unless it was written with testing friendliness as part of the requirements.

Sadly not all code is great.

For C, I've found it's not a test friendliness thing though; the great C libraries were doing this before unit testing made it's way into their codebases. They dependency inject IO, memory allocation, and logging because they have no idea what you as the end user are going to be using for those. So you pass all that in on an env struct when you initialize the library.

You probably want it rigged up to your own logger instead of just blindly writing to stdout. You probably want the library's allocations tagged somehow on the heap so you can track down memory leaks. You probably don't want it doing IO directly, because of how many different way there are to do IO.

It's all more a function of how incredibly varied c envs are, than design for testability. It just happens to be very testable as an aside.

Yes there is. What you do is have don't put any IO calls inside your pure functions, but rather pass in their results as parameters.

Keep the impure code and the pure code separated.

Beautifully said, practically impossible unless the language imposes it as programming model and you have 100% control over the complete source code.

Yup, this is very true. As long as you return something deterministic, unit testing is easy.

It's where you need to handle mutable state with objects that things get trickier.

Unfortunately, these are exactly the places where you most need tests.

I'm a big fan of constantly returning things rather than holding state in objects, for specifically this reason.

A basic issue of encapsulation: objects shouldn't rely on the mutable state of something else, but only on their own.

> The closer your code resembles pure functions, the easier it is to do dependency injection and unit testing.

If the only thing you inject is data, can we still call that "dependency injection"?

> If the only thing you inject is data, can we still call that "dependency injection"?

I suppose that's a philosophical question.

Probably the 2 most common functions I 'dependency inject' are rand() and time.now(). I feel like they count, but you might not.

I mean, if you pass a HOF to some other function, then that is also a dependency.

Correct, but I haven't seen it happen often in practice. I mean, pretty much every project uses HOF, but few have many of them.

I also tend to avoid HOF when I can instead pass data around explicitly.

You're one sentence away from discovering Common Lisp ; )

Define data.

Anything that doesn't have an arrow in its type.

Mock frameworks make it trivial.

  Foo foo = new Foo(mock(Bar.class))

Not sure what the issue is there really.

> Mock frameworks make it trivial.

In my experience mock objects can be brittle. A few sprinkled in judiciously can be ok, but once the density gets high enough, it starts to feel like the test becomes decoupled from the actual code it's supposed to test.

Agree. To add to this, many unit tests that you might have to do become obsolete with a strongly typed functional language. At that point you’re basically only integration testing the API boundaries / external interfaces.

Early in my career I saw a large legacy project that was riddled with bugs turned around after a senior developer insisted on having unit tests. No one else believed in the value of unit testing, so he added them on his own in his free time. Occasionally another developer would push up some code that broke the senior developer's tests, and he gradually got the upper hand because he now had proof that his tests were finding real problems.

Everyone started writing unit tests, and the code broke less. Developers became more confident in deploying, and eventually most PRs looked roughly the same: 10-20 line diff on the top, unit tests on the bottom. If there were no tests, the reviewer asked for tests. It became a fun and safe project to work on, rather than something we all feared might break at any moment.

I've since started insisting on having them as well, especially when I'm using dynamically typed languages. A lot of the tests I write in Python for example are already covered in a language like Go just by having the type system.

I programmed the first 10 years of my life in compiled statically typed languages (C, C++, Java, etc), then I needed to start programming in Ruby for production environments and initially I felt "naked"; I felt so insecure when building something and not having it compiling successfully. That's when I really got into Unit Tests, bugs as stupid as "vlue" instead of "value" typos can plague your codebase in languages like javascript, python, ruby, etc; and testing is the only way to find them (other than... in production errors).

Functional Code with no side effect should be unit tested. Integration Code which glues various components together should have integration tests. If you feel like you need unit tests but have to create too many mocks, you have merged functional and integration code, separate them out.

We initially only had integration tests, because many people think they're better. I get it: itests use the real plumbing, so they're more representative of your runtime. But they're slooooow -- especially the tests that involve the DB (which is most of our itests).

So we started adding unit tests. Utesting code that wasn't written for utests is painful: you often need to choose between refactoring or just patching the hell out of it. The latter is highly undesirable, since it leads to verbose tests, failures when you move a module, and the inability to do blackbox testing.

But utests encourage our new code to be clean and readable. We've found that functional programming is much easier to test than object-oriented, and is easier for engineers to grok. We just sprinkle a little dependency injection and the whole thing works nicely.

Itests have their place, but utests lead to faster feedback and more readable code.

I think this is part of the problem.

Unit tests are an easy path to fall down, because they're clearly easier to setup, to write for, require less effort to maintain, execute more quickly.

But you don't realise their significant downside until after you attempt a major refactor - you begin to see that unit tests are testing at the layer that changes the most anyway.

Weird that you started using a functional approach, noticed that it’s easier to unit test, and drew the conclusion that unit testing is what led to more readable code. Consider that functional code is the source of readability. Also we don’t typically call it “dependency injection” in the functional world

You're absolutely right: functional code is the source of the readability. But writing unit tests incentivizes engineers to keep things functional.

What's a better term than "dependency injection"? What should I call an argument whose default is always used in production code, but is there to make passing a mock easy? I'm not trying to be snide -- I'm genuinely curious.

I always just called it a "default argument"

OT: Good Old Neon is a fantastic piece of writing.

I'm a massive fan of unit testing, but I mostly agree with the observations. However, I (mostly) disagree with the conclusion. The problems with unit testing I've seen to come from the following anti-patterns in various combinations.

1) The use of unit tests as the exclusive automated test type. ie; No functional tests, integration, etc.

2) Test doubles for most or every dependency, even purely functional dependencies like math libraries.

3) Not using the appropriate kind of test double for the test at hand. (Dummies vs Fakes, vs Spies, vs Stubs, vs Mocks)

4) The overuse of mocking libraries.

Mocking libraries have their place, but in opinion, are used approximately a hundred, perhaps even a thousand times more often than they should be. I use them to create test doubles in exactly three scenarios:

1) A dependency that does not have an interface, usually a third party library. This usually happens in one place only, and is used for writing the wrapper code test.

2) A dependency that has an incredibly large interface and/or dependency graph where building a set of stubs or spies is simply not worth the effort.

3) I want to test weird edge cases that's not available any other way, such as theoretically unreachable code.

These should not be the majority of your unit tests!

Code is a liability. Unit Tests are code obviously and are no less prone to contain bugs than the code under test. And of course they require maintenance just like any other part of the codebase.

It feels like the industry has blindly pushed for unit testing everything and 80% or more code coverage as the gold standard.

I’ve given up arguing about the cost/benefit of unit tests at work. I feel that the software the teams I’ve worked on over the past couple of decades still produce about as many bugs as before unit testing came along. I’m not building pace makers or aviation software, mostly LOB applications.

Unit tests provide a false sense of security (especially to management.) Yes sometimes they help catch refactoring bugs, but at what cost?

The article emits a key point, when talking about any practice: the context, in which unit testing is performed. The size of the team, the type of company, the technology, and the impact of product defects.

For a startup with a small team and few customers building an MVP? Unit testing is overrated.

For a company with 50 engineers in 10 teams building a product, that moved $500,000/day in revenue? Unit testing could or could not be overrated.

For a company with 1,000 engineers working in the same repo, shipping a product that moves $50M in revenue per day? Unit testing is most likely underrated - and essential.

You cannot ignore how the organization works, and the cost of a defect that a unit test could have caught. I happen to work at the third type of organization, and while unit tests might not be the most efficient type of safety net, it is a very big one. We have other types of testing layers on top of unit: integration and E2E tests as well.

Also, one more fallacy in the article: "If we look back, it’s clear that high-level testing was tough in 2000, it probably still was in 2009, but it’s 2020 outside and we are, in fact, living in the future. Advancements in technology and software design have made it a much less significant issue than it once was."

This is not true everywhere. High-level / E2E testing on native mobile applications in 2020 is just as bad as it was on the web in 2009.

> Unit testing is most likely underrated - and essential.

You are right, but it still doesn't mean aiming for high coverage. In the big company case you'll want to cover the interfaces and dependencies and less of your team's code.

I know that part of this will fall under "integration" but definitions are sneaky.

Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact