Hacker News new | past | comments | ask | show | jobs | submit login
Unit Tests Considered Harmful (shaiyallin.com)
36 points by kiyanwang 54 days ago | hide | past | favorite | 74 comments



The target audience for unit tests is not the client, it is the developer. Unit tests allow you to change code with more confidence. 100% code coverage is not a useful aim, you should aim for 100% confidence in your code. Unit tests can also function as example code, that can't get out of date, since then the tests will fail.

Testing for quality assurance is a different thing, usually called acceptance testing and sometimes including regression testing. One easy way to make clients angry is by re-introducing bugs that were fixed in earlier releases.


A few years ago, I got tired of my company talking about plans to upgrade our main app from Python 2 to Python 3, until one weekend I just did it. The tests caught a million little changes that I worked through until everything passed. Come Monday, we were on Python 3 and I took a couple workdays off to play video games.

I wouldn’t have dared even start if I didn’t have confidence in our test suite.


I was not so lucky! I had a good test suite, but there were a lot of places where I mixed up strings and bytes. I had to add new parameters to specify in Unicode encoding/decoding options, new tests to handle those failures, and new APIs so I could have one function return bytes and another return strings.

Plus, I had C extensions, which had to be updated to handle changes in the Python/C API, including places where Python 2.7 could handle both Unicode and bytes in the ASCII subset:

  >>> u"bbcf".decode("hex")
  '\xbb\xcf'
  >>> "bbcf".decode("hex")
  '\xbb\xcf'
  >>> b"bbcf".decode("hex")
  '\xbb\xcf'
but under Python 3 required more work:

  >>> bytes.fromhex("BBCF")
  b'\xbb\xcf'
  >>> bytes.fromhex(b"BBCF")
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: fromhex() argument must be str, not bytes
AND, I needed to support both Python 2.7 and Python 3.5+ on the same code base.

AND, I needed to support various third-party tools that had their own different migration paths for how to handle the transition.


As the author points out, "What constitutes a unit? Is it a function? A class?"

If you test to a function, or a class, then your tests imply that function or class must be present, with that specific API.

In my experience, most people write unit tests for internal implementation details that are irrelevant to the business domain. They end up inhibiting code change rather than encouraging change, because anything changes often require re-evaluating each failing test to see if it was meaningful in the first place - and if 100s of tests are no longer meaningful, it's easy to skip the couple of tests which are true regression tests.

As the author writes, full end-to-end tests "are often slow, cumbersome, hard to debug and tend to be flaky."

Instead, find the internal interfaces which tied to the business logic ("something that delivers value to a paying client"), and write the tests to that. You can use unit test frameworks for that sort of functional testing.


>If you test to a function, or a class, then your tests imply that function or class must be present, with that specific API.

So what? Every implementation will have some basic structure. That structure can be modified if needed. The point of a unit test is not to posit that any particular thing exists, but that the things that do exist actually work.

>In my experience, most people write unit tests for internal implementation details that are irrelevant to the business domain. They end up inhibiting code change rather than encouraging change, because anything changes often require re-evaluating each failing test to see if it was meaningful in the first place - and if 100s of tests are no longer meaningful, it's easy to skip the couple of tests which are true regression tests.

If the internal implementation is not observable without a bunch of other bullshit, these tests can help instill confidence that the stuff actually works. If you don't test, or test the overall system, it takes much longer. There is such a thing as a pointless test but it is far more common in my experience to have stuff that isn't covered at all by tests. If your biggest problem is that you have to delete some tests that you made obsolete, that's perfectly ok.

>As the author writes, full end-to-end tests "are often slow, cumbersome, hard to debug and tend to be flaky."

Those types of tests are not unit tests.

>Instead, find the internal interfaces which tied to the business logic ("something that delivers value to a paying client"), and write the tests to that. You can use unit test frameworks for that sort of functional testing.

It isn't only the business logic that needs to be tested. Anything that is cumbersome to test "enough" in the overall system ought to be unit tested. At work I'm faced with a series of structures that are cumbersome to test in isolation and in totality. If I had unit tests I could make changes at least 3x faster.


> Every implementation will have some basic structure.

Setting aside functional vs OO paradigms, even if you have a simple helper function like:

  def removeprefix(s, prefix):
    if s.startswith(prefix):
      return s[len(prefix):]
    return s
do you write tests for it? Or do you write tests for the higher level routines which call it? I think most write tests for the function.

If you write tests for the function then you hinder future refactoring from removeprefix(s, prefix) to s.removeprefix(prefix) once you switch to a Python which implements s.removeprefix (3.9, I think?)

For example, in red-green-refactor TDD, you are not supposed to change the tests when you refactor.

What you've ended up with are tests for the specific structure, and not the general goals.

> If you don't test, or test the overall system, it takes much longer.

Good thing neither I nor the linked-to author makes either of those arguments.

> it is far more common in my experience to have stuff that isn't covered at all by tests

Which is why I use coverage-based method to identify what need more tests. Coverage is a powerful tool, and easily misused.

> If your biggest problem is that you have to delete some tests that you made obsolete, that's perfectly ok.

The biggest problem is that you decide to not refactor because there are so many tests to delete, and you have to figure out if the failing test really is okay to delete (because it was implementation specific) vs. one that needs to be updated (eg, because it handles a regression case).

> Those types of tests are not unit tests

Correct! And no one said they were.

> At work I'm faced with a ...

That's fine. You do you.


>do you write tests for it? Or do you write tests for the higher level routines which call it? I think most write tests for the function.

I try to not test things that are that simple. But I wouldn't fault anyone for writing a small number of test cases for it.

>For example, in red-green-refactor TDD, you are not supposed to change the tests when you refactor. > >What you've ended up with are tests for the specific structure, and not the general goals.

I'm not familiar with this methodology. But if your refactoring requires changing the interfaces then it must require changing the tests. It may be ideal to have such rigid interfaces in some cases, but I don't think you have to be so committed to one methodology. Big changes require big testing.

>If you write tests for the function then you hinder future refactoring from removeprefix(s, prefix) to s.removeprefix(prefix) once you switch to a Python which implements s.removeprefix (3.9, I think?)

How? Is the old code supposed to stop working because a new function was added? You aren't obligated to delete perfectly working and tested code. And if the new version of Python breaks it somehow, the tests will tell you quickly compared to finding out in a bigger test.

>Which is why I use coverage-based method to identify what need more tests. Coverage is a powerful tool, and easily misused.

Coverage is fine. But I mean, coverage is for people who already committed to testing every line of code. Not every project has unit tests in the first place so coverage is a moot point.

>The biggest problem is that you decide to not refactor because there are so many tests to delete, and you have to figure out if the failing test really is okay to delete (because it was implementation specific) vs. one that needs to be updated (eg, because it handles a regression case).

This work is very easy compared to fixing the real product in most cases. If you didn't write a bunch of frivolous tests it isn't that much work.

>>Those types of tests are not unit tests > >Correct! And no one said they were.

Well this is a discussion about unit tests. If you want to talk about other types of tests you need to be clear about that.

>That's fine. You do you.

I was just pointing out that unit tests are not always feasible to do. But I wish they were. The stuff I see at work was not designed by me, so it's not my fault it is infeasible to test at least.


> You aren't obligated to delete perfectly working and tested code.

You aren't obligated to write unit tests either.

If your goal includes long-term maintainability then should refactor for simplicity and consistency. Removing unneeded code and unneeded tests is a good thing.

> coverage is for people who already committed to testing every line of code.

Much as I would like it, I don't have 100% coverage. I use coverage to identify and prioritize which areas of code need more tests. Some are more important than others.

I also use it to identify code which is no longer needed, like workarounds for third-party-package V2021 which have been fixed in late releases, and I have no need to support V2021 any more.

> If you didn't write a bunch of frivolous tests it isn't that much work.

My observation is that people write a bunch of frivolous tests.

> If you want to talk about other types of tests you need to be clear about that.

I am talking about how to design unit tests.

Someone else here pointed to Robert Martin's essay on exactly this topic, at http://blog.cleancoder.com/uncle-bob/2017/10/03/TestContrava... .

"Design the structure of your [unit] tests to be contra-variant with the structure of your production code. ... The structure of your tests should not be a mirror of the structure of your code."

The issue I'm describing is what Martin characterizes as "the Fragile Test Problem" with some unit test approaches.


>If your goal includes long-term maintainability then should refactor for simplicity and consistency. Removing unneeded code and unneeded tests is a good thing.

I agree with this in principle, but I think people disagree about what is needed. One person might look at a good set of tests and think it's a waste of time, but if it actually hits the right points then it's probably worth keeping. I think if unit tests are feasible and can hit a bunch of cases that would be hard to recreate otherwise, then unit tests are far superior.

>"Design the structure of your [unit] tests to be contra-variant with the structure of your production code. ... The structure of your tests should not be a mirror of the structure of your code."

That's a nice ideal. But as with the refactoring issue, it may be impossible to test a thing without recreating some of that structure. That's why we have mocks.


Depends on how your unit tests were designed. You need to avoid the Fragile Test Problem. Bob Martin wrote about this a while back: http://blog.cleancoder.com/uncle-bob/2017/10/03/TestContrava...


There needs to be a rule against having to read entire articles or YT videos just so that a reader can understand the point that a commenter is trying to make, but which they cannot explain in their own words, no? It’s very anti-discussion, and how do we even know to what extent commenters are quoting or agreeing with their sources?


Your unit tests should be contravariant to the code its testing so that way your code can be refactored and your existing unit tests can be used AS-IS to test the refactored code.

The Fragile Test Problem arises when your unit tests are covariant to the code its testing, such that a significant refactor breaks the tests - not because the code is necessarily wrong, but the structure of the code is no longer what the unit test was expecting.


While the link in question isn't a good article, for complex subjects like this a long article is much better. The article linked should take several days to create if you are a great writer (again, I would create in a different way, but the content deserves days or even weeks of work to create), and thus cannot fit in a comment box except in a reduced form that doesn't do the idea justice.


Agree. The "what about the client" trope is a complete non-argument. If unit tests make code better, the client maybe wants that. The argument should purely be based on if they are helpful.


I'll take it a step further than this. It doesn't matter if the client wants unit tests or not. If the developer has been hired to do the job, it's going to be up to that developer whether or not they feel unit tests will make the code better. If they do, they should do them. If they don't, they shouldn't.

Caring about the client's opinions re: unit tests is a little like caring about their opinion re: the interior color of your vehicle. Sure we can talk about it if it comes up over a beer some Friday evening but we're certainly not going to change anything based on the conversation.


Most clients should care about code that is long term maintainable even if it isn't in the contract. Thus tests are good. As a professional engineer (I'm intentionally invoking professional in context of other industries where the term has legal implications on quality, even though it doesn't exist for software it should!) you should care about that


If it is in the contract and specs (either way) then you do what you are paid to do.


Well of course. That's not relevant to the larger point, though.


How do you measure 100% confidence? Pretty sure I have never reached 100% confidence, but I haven't had a metric to track it so I have no data to prove it.

The only possible way to reach near 100% is I think by not writing any code, and by not building anything.

I do think 100% code coverage goal doesn't mean much, since you can potentially achieve it using 0 assertions. Or you make 1 assertion per test, leaving 95%+ expected results unasserted. Assertions could also be wrong.


You measure how many of your deployments ship broken code. I've worked in contexts where test coverage was sufficient enough to be confident that when deploying there wouldn't be any issues apart from really novel unexpected stuff you wouldn't have tested anyway.

I've also worked in contexts wher I didn't have that confidence.

I would say you know when you're in that environment and directly measuring doesn't really help as it's more a matter of culture than technical metrics.


Because it's linked in another post right now:

If you feel confident that renovate[1] pull requests can be merged without review on a green pipeline, you have 100% confidence in your tests ;-)

My first context was that way.

[1]: https://www.jvt.me/posts/2024/04/12/use-renovate/


What do you mean by "broken code"? Any sort of code that yields in a bug? Any sort of bug? Because interpreting directly broken code implies to me some sort of compilation error. Also measuring that seems very complicated, depending on how you interpret "broken code". E.g. a bug could influence 100% of the customers, 5% of the customers, 0.1% of the customers and only when certain conditions happen, it could be a combination of deployments introducing this. It could be a bug that can easily be worked around by a customer. It could be a bug that only arrives 6 months down the line, because some dependency changed, and you didn't have proper handling for it, even though it was expected that you had. E.g. you are calling a third party API that alerts you in advance they are planning to add extra error code in the system in 6 months, so you should probably also include it, but you don't and then system bugs due to that.

The problem could arrive 2 years in the future when scale increases and the code wasn't built to handle it, even though it was expected.

You take 100 of your past deployments and see how many resulted in some sort of a bug? E.g. if there's 0, then you have 100% confidence?

> any issues apart from really novel unexpected stuff you wouldn't have tested anyway.

I mean that's one thing that makes it impossible to have it 100% for me at least.

> I would say you know when you're in that environment and directly measuring doesn't really help as it's more a matter of culture than technical metrics.

I may have been in wrong environments throughout my whole life, but I also have trouble imagining a perfect environment where there's 100% confidence.


Confidence can be achieved using https://en.wikipedia.org/wiki/Mutation_testing. this involves running an engine that mutates your code and check if you have at least one assertion failing.


I agree. The post completely misses the point of unit testing in the first place by drawing a relationship to the features being implemented.


Everyone talking about unit tests, and testing in general as a thing to achieve are missing it entirely. Tests aren't a thing to achieve, and talking about whether unit testing, mocks etc. are better or worse without context is pointless. The thing to achieve is having a code base that 1) works 2) can be changed easily and proved to be still working. Now that we know that we can start deciding "how" we can achieve that and it will be different for every code base (and often different strategies for different parts of the code base). The key thing to realize in a test strategy is that your code has two interfaces, not one. The first interface is the user, the second is some form of automated test harness. Hence, testing becomes a function of two points above and a design required to achieve that.

In summary, don't do this:

- I need 100% code coverage

- Everything needs a mock

- Ensuring unit tests for everything

Do this:

- Ask how can I ensure my code works?

- How can I ensure my code keeps working?

- What sort of testing strategy can achieve the above?

- What sort of testing strategy is easily maintained without causing tons of excess burden?

- What interfaces do I need on each module of my software to achieve this?


I wish management knew this... It's a nightmare when some manager that hasn't coded in years thinks it's a good idea to push unit testing into the CI builds so that it fails if it doesn't meet code coverage criteria etc.


I afraid in practice answering latter questions brings you to the former statements.


Very good advice. Additionally, I'd add:

- What sort of testing can be achieved given limited time and resources?

...because nearly all organizations, no matter how quality-oriented they claim to be, will prioritize non-test code delivery over test code, leaving scarce time for test creation before shipment deadlines.


Not to pick on you specifically, but I tend to agree with other posters that testing (automated or otherwise) is just an element of programming. Like all elements it needs to be done to taste, but it's pretty essential.

A line of questioning: Do you have time to write clear code? Time for comments? Time to manually test your changes hundreds of times? Time to refactor existing code when adding new code? Time to remove dead code? Time to automate the testing while you still have the little state machine you are working on in your head? Time to add observability to spot performance issues? Time to consider your rollout plan? Time to keep on top of changes which are in use? etc...

Automated tests (including unit tests) are just one part of writing correct code. If you are asked, "How long is it going to take?" that implies finishing all aspects of coding required to get something correct into use. You prioritize that, not anyone else.


I fully agree with all your points. My comment was based in my observations that, even given the truth of everything you said, "time to prepare automated tests" is usually one of the first things to shrink as unexpected events cause projects to slip on their timelines.

That's not a good thing, but it is a real thing: healthy organizations should respond by protecting that time, extending deadlines, and assessing what about their process and environment is causing planned timelines to not match reality ... but that doesn't always happen. When it doesn't happen, testing discipline can slip from (for example) "we need unit tests for complex logic modules, and integration tests for real networked service interactions, and at least 70% coverage" to "just get the happy path tested however you can so this feature makes it into the next release".

Given the reality that this will sometimes occur, engineers should remember to always prioritize the highest-value (that is, the maxima between time spent creating tests and defects that those tests can catch) testing work and methodologies. In good conditions, that will result in them writing the most important tests first and then writing any other tests they feel they need. But in bad conditions, this approach will still ensure that the most important automated verification is present.


It keeps coming back to the same stupid erosion of paradigms.

Once upon a time, some developers that were smarter than others started testing their code. They described ways to test smaller parts of large software systems separately before being integrated together.

For instance "parameter testing" (to validate component subprograms against their specification) and "assembly testing" (for parts put together)[^1]

As always happens, a formal definition soon followed and the industry settled on names like "unit tests", "component tests" and "integration tests".[^2]

Of course, as soon as things are describes somewhere, this means there is documentation for developer to ignore. Or misinterpret.

Next thing you know, "Unit-Testing" is (ab)used, and people start complaining it doesn't work.

So some developers that were smarter than others say "You are doing it wrong!" and describe a better way of doing things... For instance, writing the test _before_ you write the code, so the scale/scope of a "unit" is defined beforehand. There is much rejoicing, and TDD is the new magic bullet.

So now there is more documentation for developer to ignore. So some developers that were smarter than others say "You are doing it wrong!" and define BDD, which is basically just TDD "done right", as the term TDD has become polluted. [^4]

So now there is more documentation for developer to ignore. So some developers that were smarter than others say "You are doing it wrong!" and define DDD, which is basically just BDD "done right", as the term BDD has become polluted.

And so the merry dance continues.

Rather than understanding which practice helps improve quality in which (coding) problem domain, developers stomp around with their preferred hammer, looking for nails, complaining about everybody else's hammer.

I'd say, rather than using one or the other, based on developer desired, focus on things from the user perspective... Take a look at the Test Automation Pyramid and start writing the right tests in the right domains depending on business/domain value.

[^1]: H.D. Benington, 1956 [^2]: James J. Donegan; Calvin Packard; Paul Pashby, 1964 and Norman A. Zimmerman, 1969 [^3]: Lee Copeland, 2001 [^4]: Dan North 2006


Studies on TDD has failed to show a benefit over writing tests after the fact. The only factor that seems to matter is writing tests, the more tests the better. If you can write your code so it's amenable to more testing, or such that it requires fewer tests because it's simpler/has fewer cases to handle, great. If you have a framework that can generate tests (fuzzing, property-based testing), awesome.


I'm pretty convinced that test-first is best seen as a negotiating technique to get the time to write the tests in the first place.

"Okay, the code is done, time to write the tests" results in pressure to shorten testing, since if the golden path works then surely everything else works, right?

Your management and your sales people then say "You're a great developer, it looks like it works, and we need to deploy that code now because customers are demanding it and the competition is breathing down our necks."

It's hard to resist that pressure.

While saying that it's best practices to do TDD means you don't need to deal with that sort of negotiation.


I can see that, but only because the dev has arguably already made the mistake of showing sales working features before tests are written. I've always been clear to say when something is a partial mockup to verify that I've understood what they're asking for.


Yes, but many places don't have a healthy work environment, and developers are generally younger, less confrontational, and less experienced at negotiating than management and sales.


True that. One of the major factors of my success as a software developer is a lack of fear (either through stupidity or bravery) of "taking on" toxic/unfair/destructive environments and people.


Studies on software engineering are all low reliability ones full of confusing factors.

(What's the matter with SE as a discipline?)


Yeah, that's something I've been promoting for years. I don't care whether you write tests first or last, as long as you eventually get into a feedback loop of coding and verifying your code, quality will improve.

The same goes for customer specs. Just build a thing and verify if this is right.

Rinse, repeat. Progress is the only metric.


If your not Dijkstra then using title like that negatively affects people's perception of your post.


“Considered Harmful” considered harmful? :)


“Considered Harmful” Essays Considered Harmful, Eric A. Meyer, 28 December 2002:

https://meyerweb.com/eric/comment/chech.html


I usually assume it's going to be a controversial take, but controversy shouldn't be inherently negative to anyone. Plenty of the best things we have today caused and were propelled by controversy.


The title was Wirth's fault.


I think whats more harmful than unit tests are code coverage metrics for unit tests that devs feel compelled or are required to achieve. The easiest way to achieve code coverage goals for tests is to write lots of small tests that test individual methods, but test very little of the interaction between them.

I feel that the goal of unit testing should be to test the largest unit possible without requiring external dependencies. In the language of domain driven design, this means to test the domain model. If you can get extensive coverage of the domain model as a whole, system tests can be used to test the complete system.

Alas, I have seen very few software systems with high quality domain models. It is not an easy thing to achieve.


Automated tests are great, the problem described is over-reliance on mocking. Mocking should only be used for dependencies which have non-deterministic behavior or where it is otherwise impractical to use the real implementation.


The author is mostly complaining about mocking from what I can tell.

>>>> Strict coverage for classes prevents regression in these classes but does not assert that the feature actually works

literally the point of unit tests


The clue is in the name. Unit Tests test units of code (how ever you want to define it. Integration Tests test how units of code work together.

Perspective is also useful. What looks like an integration test to a low level developer, could be a unit test higher up the stack.


This the best talk I've ever seen about testing (Ian Cooper): https://www.youtube.com/watch?v=EZ05e7EMOLM

I see he's done a much more recent follow up, which I intend to watch: https://www.youtube.com/watch?v=IN9lftH0cJc Not sure if it contains new ideas.

He says the unit is not "a class" but a module. It's up to you to decide what the module does (ie. what level of abstraction it's at). More importantly, he argues against mocking and essentially "unit testing" in most cases.


This whole article could be summed up as 'Be careful when you're mocking' and that would be sound advice.


Tests - whatever you wish to call them - are an assertion that something will always and forever be true. A unit thus needs to be a unit of code that you are telling all future maintainers that they cannot refactor such that this changes.

If you are writing the list data structure for a new language (if you like LISP you are looking for a better name than car/cdr, while C++ guys are thinking vector without the template mess, and ... all are correct for this discussion) - this will quickly be used by everyone and so it doesn't matter if you can't change the API as this data structure will soon be used everywhere in your code and so a change isn't possible. However most classes/functions are only used in a couple places that they are close to and so it isn't a big deal to change them - thus you shouldn't write tests that force the API.

When you have a large system eventually you need to break it up into things that are smaller just to understand. Those are good places to write tests and mocks as you get flexibility internally to do what you want and you don't really understand what is going on outside. However someone needs to write whole system integration tests as nobody understands what happens between the smaller parts.


I've seen this too many times: Developer gets obsessed with unit testing everything in the codebase. Tons of work, months later we have 100% coverage. Great. Then the rate of defects in production increases or stays steady. Why?

Confidence in units != confidence in your production system. I've observed that overconfidence in "units" tends to blind devs to other potential problem areas.

The most egregious example was an application release that wouldn't even start - there was an obvious bug in main that would have been caught by simply running the damn program and using your eyeballs. But the unit tests didn't catch it because main was tested in a mocked environment. Oops.

Likewise, I've seen similar regressions in quality control, user experience, performance, etc. - we know these things are critical to the real-world success of software, yet unit tests don't touch them.

There are software developers who are so overconfident in their unit tests that they ship code without ever bothering to run the application! Functional requirements be damned. "But the unit tests said it worked..." is not a professional excuse. Don't be that developer. Edit: to be clear, this is a message to myself as well. I was that developer a few times!


I tend to prefer using test harnesses[0], but I also use unit tests -a lot.

I think having a fundamental Quality mindset is what we actually need, and there's really no way to automate that.

[0] https://littlegreenviper.com/miscellany/testing-harness-vs-u...


OP here,

As some have noted, I’m not actually against tests. Quite the contrary - I’ve been practicing TDD since 2013. My approach to software is driving emergent design by TDD, initially via an E2E test, then via fast, integrative acceptance tests that exercise features while faking IO. I do write unit tests for very specific and pure pieces of software that embody a business domain, such as algorithms or validations.

What I abhor is compulsively writing mockist unit tests for all classes or functions with the goal of satisfying some coverage metrics. These are hurting you thrice: you waste time writing them and maintaining them; you get a false sense of confidence because you have “full coverage”; and when trying to refactor, they hold you back.

You can read more about my approach here: https://www.shaiyallin.com/post/fake-don-t-mock


Yes, this article is correct. I have also been saying this in comments on this site. Automated tests should be on the right level to provide value. On a level that is too low, one ends up with tests that test implementation details or the highly interesting feature that the list type in the standard library of your favorite programming language is still capable of adding elements to itself. On a level that is too high one ends up with fragile tests where there is a lot of state to initialize before anything can be tested. Tests can occur on any level but they should be on a level that makes them interesting, which generally means that they test a property of the system that the customer could recognize as something they value. Most tests should maybe test how between one and five classes work together.


You unit test the pieces, then integrate and system+regression test the product.

Its got nothing to do with how you define unit - just a way to:

1) speed up overall testing and bug fixing

2) get better test coverage of units

3) stop wasting other team members time by testing your unit before saying it's ready for system test


> This approach has seen a lot of traction during the past 15 years. [...] And it’s complete and utter rubbish.

I'm a bit tired of hearing this about literally every aspect or approach in programming.

There is generally an appropriate application for almost any tool, design pattern, etc. I see little to no value in making sweeping statements about how "rubbish" something is, especially something like unit testing (regardless of how most would define it).


I see unit tests as a form of written contract. You write down what matters. Everything else is as solid as a verbal contract.


That’s how it works too for medical software: the test proves that the specification has been implemented in a feature, that the feature kinda works, and as a bonus you get some code coverage.


The obligatory definitive guide, of which the author appears to be unaware: http://www.growing-object-oriented-software.com/


> My approach prefers extracting IO operations to adapters, testing them separately, then using reliable fakes to test the bulk of the system’s behavior from the outside

...which is exactly a unit test.

If you want to talk about "functional core, imperative shell" and its implications for tests, then do _that_. Writing clickbaity titles that directly contradict the content of your short blog post itself isn't a great way to foster nuanced discussion.


Excessive mocking can be problematic. I prefer to test the real thing where possible even this makes most of my unit tests actually "integration tests" in the eyes of some.


As long as there's way to identify a true unit test (no outside dependencies) vs an integration test (some dependencies like a DB or Redis), it doesn't matter to me. I'd like to be able to see which category they're in and run them separately (for CI purposes), but they're all just code that runs other code and checks the output, at the end of the day.


Clickbait sharing the issue but related with unit tests but mocks ;(


This reads like a junior developer's first foray into the concept of the test pyramid. Congrats, author has learned that there is a forest beyond the trees.


And, if not for testing, how would the xz hack have been exposed?

Vast amounts of projects with tests >>> an occasional blog post with some quibble.


There we go with "considered harmful" again.


This reeks of being contrarian for the sake of it. Surprised it didn't come from some 37 Signals related publication.


Once again, missing the point of unit testing. Unit testing is about improving the design of the software, and allowing the design to evolve through refactoring (which is effectively impossible without unit testing). E2E or integration tests are for catching bugs


Testing is difficult. It has both the problems of perverse metrics as well as all the problems inherent in coding itself. Unit tests aren't rubbish, they just aren't the only type of testing needed for complex projects. They are a good starting point and I suspect their popularity comes from that. This article was mostly fluff and I regret the 3 min I spent testing it out. Save yourself a read, it's mostly a hot take with a clickbait title.


Enough with the hyperbole. The argument is that unit tests alone are insufficient to ensure your software is functional and to detect regressions, and likewise unit test coverage is not an incontrovertible measure of quality or reliability. We can just present that argument on its own merit without the hot take.

My observations:

- developers need a fast feedback loop. While you’re in the process of changing things, waiting many seconds or (ugh) minutes for feedback not only wastes time but risks pulling you out of a flow state.

- e2e and integration tests are often much slower than unit tests and often require more coordination, putting pressure on the prior point. If you’re able to get e2e/integration tests working just as fast, good for you, do what you want (but you’re probably mocking a lot to do so, isn’t that what the author was complaining about)?

- It’s usually difficult to build e2e or integration tests until all the components are at least stubbed out. This is fine for finding regressions and providing evidence you’ve met your acceptance criteria, but they can be awkward to use during the active development process where you are refactoring considerably.

- e2e/integration tests are harder and a bit more expensive to build, and this is made worse when you have lots of edge cases you’d like to cover.

- the slower, the more expensive, the less immediately useful your testing approach is, the less likely a developer is to follow it, either by avoiding it entirely or by creating low value tests. Sure, the former can be managed by mandate, the latter is squishier.

So what’s my point? It doesn’t have to be all or nothing. I usually end up with a lot of fast unit tests to cover tricky edge cases of all my more critical components, and a smaller number of integration tests aligned with acceptance criteria to provide regression coverage. And then smoke tests. And then load and stress tests. And then… well, anyway, use the ones that are appropriate to your project and available resources, but defense in depth is a strategy that also works for quality (e.g. a coordinated combination of several types of tests gives you more assurance, if you’ve got the time and money)

There’s a reason we’ve developed a lot of different testing techniques/types; they have different purposes and deal with problems. Trying to roll back the clock and thinking you can solve all your testing problems in one shot is IMO naive.


Excellent point. I think you really get to the heart of the issue.

Different kinds of tests offer different levels of confidence at different levels of time and investment.

A fully integrated shop should think this through carefully and design their own comprehensive end-to-end process. Which feedback is most helpful at which point of the idea -> deployed product pipeline? What value does it offer to the larger process? Feed in org-specific variables like release timelines, team structure, customer expectations, money, compute resources. Use appropriate tools for each piece and remember the largest goals to keep the company reliably delivering whatever makes that company valuable.


The author conflates unit testing with integration testing, although it seems the terminology is a huge source of confusion for articles like this. Anything with mocks is an integration. Units are pure functions only. Generally a mixture of unit and e2e is the best way to go. Mocked out integrations tend to be a nightmare to maintain and provide little value.


> Anything with mocks is an integration.

That's ... somewhere between "flat out wrong" and "I have never seen that terminology in use in my career so maybe I'm just extremely unlucky".

Sure, when things are on the line between integration and other types of tests the definition is often subjective/debatable. However, in my and my colleagues' experience, most integration tests are distinguished by reduced amounts of mocking, especially as pertains to: dependencies provided by other teams/business units; dependencies provided by third-party code; dependencies provided by networked services (including databases, caches, etc.).

In my experience, this tends to result in tests that take a bit longer to run and, due to their lower level of granularity, require slightly longer to understand failures. In return, they provide significantly higher prod-representativeness and business-value-delivered-per-hour-spent-test-writing.


>In my experience, this tends to result in tests that take a bit longer to run and, due to their lower level of granularity, require slightly longer to understand failures. In return, they provide significantly higher prod-representativeness and business-value-delivered-per-hour-spent-test-writing.

You are describing E2E tests here though. And this is the general problem with even discussing these things. It seems that literally everyone has a different mental model of the boundaries between each.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: