Hacker News new | past | comments | ask | show | jobs | submit login
Against Testing (tedunangst.com)
172 points by amatheus on July 7, 2020 | hide | past | favorite | 230 comments



Software engineering is programming over time. It's not so hard to write code that is correct today, if that's all that tests did then they wouldn't be worth the effort.

We write tests so that we know whether future changes have broken the system or not.

This article sounds like a reaction to the practice of writing many small, hermetic unit tests, which do little more than recapitulate the code under test. The weakest parts of a system are often in the joints, so the most important tests to write and to run are the integration tests, the ones that tell you with the most confidence whether the code works in the real world or not.

See, for example, this puzzling line from the article:

> Other tests grow obsolete because the target platform is retired, but not the test. There's even less pressure to remove stale tests than useless code. Watch as the workaround for Windows XP is removed from the code, but not the test checking it still works.

How do you remove the code for a feature, but not the test for that feature unless either you aren't running your tests or your tests aren't testing the things that you care about?


Think the question "What's the return on investment on writing tests?" doesn't get asked enough. There's a blind assumption that writing lots of tests is always good and code coverage tools tend to push the idea that you're not done until you reach 100% coverage.

Imagine some incubating startup where you have an initial 4-week runway for development, with the goal of getting a prototype working, so you can get it in the hands of some customers and begin validating a business case. How many tests do you need in this case? And how much time do you allow for it? Most of the code you're going to produce will be thrown away and the more compelling the prototype is in terms of functionality, features and design the stronger the signal you'll get back from potential customers.

To me there's still too much "testing is my religion" amongst developers and not enough "here's the quantified value of me spending 3 days writing tests and here's how that fits in the context of where our business is right now"


The benefit of testing is confidence that things are working. There are side benefits, like acting as documentation and examples, but they should not be the focus.

Writing tests (or deleting them, refactoring them, etc.) should always involve a cost/benefit calculation, even if it's a rough mental estimate that's not written down. In particular, that requires answering "How much effort will this test take me to write/debug?", "How much extra confidence will having this test give me?", "What level of confidence do I feel comfortable with?" and "Could I achieve higher confidence spending this effort on something else?".

This doesn't need much effort. For example, we might think "This array indexing took me a few attempts to get right; it could result in misleading reports, so I'd better throw a few edge cases at it to make sure I understand it correctly.". On the other hand we might think "This class doesn't have any unit tests, but there's already a bunch of validation on the results; it only affects the page layout, and it'll be obvious if something's wrong, so I'll leave it for now and put some extra checks on the gnarly payment system."

This approach makes sense regardless of the situation. In your startup example, it might make a lot of sense to write some tests that, for example, spider our site for broken links; or look for certain strings on certain pages. Things which are quick to write (they could even be shell scripts), but give us prior warning that our demo will go wrong. It would make less sense to write a pile of unit tests for the internals of some boilerplate piece of the system.

Code coverage, dogma, etc. should not be the deciding factors for what to work on.


>The benefit of testing is confidence that things are working.

No it gives you confidence that the test are passing.

I have worked on code bases that are well tested and code bases with non existing testing. The tested ones contain just as many bugs as the non-tested ones as far as the end user is concerned. In a year and a half working at the "well tested" company, regression tests caught an error one time. That was good but considering the testing team was bigger than the dev team, I question if it was actually efficient or not.


You misunderstand me. I never said that testing is an efficient way to gain confidence, or that tests give much confidence, or that only testing can provide confidence. Rather, I said:

> The benefit of testing is confidence that things are working.

Some people treat testing as a gamified metric (e.g. code coverage). Some people treat tests as executable documentation. Some people treat tests as some essential requirement, without which the software cannot be shipped. I'm saying that those are not good reasons to write tests. Tests should be written to give us confidence that things are working.

How much confidence will we gain? How much do we want? How much will it cost? Where should our efforts be directed? Those vary from project to project, and is exactly why I say thought should be given to the cost/benefit tradeoff.

(Personally I think automated testing can be an efficient way to quickly gain a lot of confidence in a system; especially if we do property checking of high-level functionality, instead of unit testing of low-level details. However, that's a separate point; we can disagree on that, whilst agreeing on the need for weighing costs versus benefits.)


Regression tests don't usually identify bugs in new features, unless they interact with the older code or required changing the older code. In my experience, they're useful to have, but not useful for identifying novel issues. I recall reading someone's analysis of applying fuzzing and that using a particular fuzzer would eventually stop discovering issues, but that didn't mean the system under test was perfect. It meant that they had worked out most (if not all) the errors that fuzzer could identify for them and needed a new fuzzer.

The same thing with regression testing. I wouldn't stop running regression tests just because they aren't identifying issues anymore, but I wouldn't rely on them to discover novel issues. That's what the new, updated test suite should be doing. A regression test suite will not have a test to cover a new feature. It will only make sure that the new feature doesn't break the old one.

[0] https://blog.regehr.org/archives/1796 - found it


> Regression tests don't usually identify bugs in new features

Neither do unit tests in most cases.


This doesn't seem to follow from my comment so I have no idea what your point is.


> To me there's still too much "testing is my religion" amongst developers

Maybe some developers. On the French web market: almost no one is.

> What's the return on investment on writing tests?

I only write end to end tests. Because I come mainly from a maintenance background: unit test for test coverage are a hindrance. But testing functionalities of an app? That's what a client want. They don't care about how you implementing things, only that it works. And that old bugs don't come back.

Yes it means new functionalities have higher estimates. But they tend to not come back once done. And doing it with end to end tests means you should easily be able to trash all the code and replace it with something else doing the same work.

The problem is the tooling. Mocking external APIs tend to be a pain even with Wiremock. Testing a GUI is not easy: a Sikuli server to drive a GUI with screenshots would be a boon. And it tends to be slow. But that's because most of the work has been going to JUnit-like tests for years.


> "What's the return on investment on writing tests?" doesn't get asked enough

I consider tests to be highest return on investment for velocity.

The smaller the change you are making, the more likely your tests will assist you in delivering faster software.

Sample sizing is probably biasing me here, but all of our more legacy code bases have higher costs for feature delivery to the point we have begun implementing logic at the API level (instead of the monolith) just to able to minimize the overall cost. The code bases that get features faster are more modern React code bases that have tests while the legacy code bases are Java or Scala with close to 3% code coverage.


> all of our more legacy code bases have higher costs for feature delivery

Compared to one another, the nominal cost of feature delivery in legacy is higher. This is a correct assertion from the perspective of the person maintaining those systems.

But from a business perspective, as long as the operational costs for that component don't exceed it's revenue / value generated, keeping legacy code running is perfectly valid.

There's an opportunity cost calculation at it's base: When does the compounded sum of all the time "lost" due to the nature of the code (lacking tests, architectural limits,...) outgrow the cost of decommissioning / replacing the legacy code? As long as the latter costs more then the former, it makes sense to keep the legacy code around.

Writing tests comes at a cost as well. The same calculation applies here as well. Is it strategically sound to sink time and money in writing tests if the costs of doing so outstrip the marginal gains on efficient feature delivery over the projected lifespan of a legacy component?


Maybe. Maybe a feature-rich but rough and unpolished prototype is what will delight customers the most.

Or maybe focusing on an absolutely minimal core, but executing it really well with high performance and having it run rock-solidly will attract the potential customers more.

I think this depends a lot on demographics, but in broad strokes, I think the former is overvalued compared to the latter. I.e. contrary to your experience!


I would be inclined to agree except people generally have low standards of performance (ofc depending on the industry!). People also rarely care about the hand of the puppeteer, what keeps the lights on parts especially if we're talking about a prototype.


I think people care a great deal more than they have words to express. We're not very good at educating our customers in ways to discuss and analyse technical performance. We're getting better at it when it comes to uptime, but latency discussions still lag (hah!) behind.

(This is from purely personal experience. I have only had one customer ever, when asked about performance requirements, who could list some. Everyone else has been "I guess I want it to be... good?")


The prototype serves to figure out what the customer does not need as much as what they do need. Polishing the wrong thing is a waste of time.


For me tests are essentially just a repeatable REPL session. And like a REPL they can radically improve my development speed.

So asking are these tests worth investment is silly, because they've decreased the time for me to write the code. It takes less investment.


I wrote a one-off test script to do a REPL session as tests. You just make a file with the commands you want, run it, and the system replaces the input-only file with the inputs+outputs. Subsequent runs replace the output.

Then you can check that file into version control, and whenever you run it you get a diff of the broken tests. If they're not actually broken, just check in the new file and you've updated your tests. It works really well for this project, a search engine.

https://github.com/pipedown/noise/blob/master/update-test-re...

https://github.com/pipedown/noise/blob/master/repl-tests/col...


Tests decrease development time!

Results may vary.


Yep, this is something I always try to instill in my peers when writing tests.

How much time does it take to get 100% code coverage, how much do you get paid, and if it was your startup would you burn through payroll cash to write them? When you start adding dollar amounts to things in terms of an engineer's salary, it's easier to see what is really important and what is waste. You are paid to write correct business logic. Tests are important, but only in that they verify your business logic is correct and stays correct.


I think in the situation you're describing, with a prototype being the difference between the business existing at all or not, you could justify all manner of sins. If you had a prototype already 90% done in COBOL it would potentially be the right call to finish that off - that doesn't mean using COBOL is the right thing to do in general. To use a common HN term, that is a situation where it is perfectly sensible to acquire technical debt, just as their are non-software contexts where it is financially sensible to acquire a bit of regular debt for a while. It doesn't shed any light on whether a lack of tests is technical debt.

(What you've described is a classic problem of junior devs trying so hard to do the right thing that they don't step back and ask why that is the "right thing" in the first place and whether that actually applies in their situation. I like to call this phenomenon the "overenthusiastic amateur". At least it tends to go away with experience - unlike the underenthusiastic dev who doesn't care what the aftermath is so long as their code works long enough for them to move on to the next thing!)


> "testing is my religion"

Not religion, but a necessary tool to knowing where to focus my attention.

> "here's the quantified value of me spending 3 days writing tests and here's how that fits in the context of where our business is right now"

How do I quantify 'getting the task done at all'?

EDIT: I mean it. To accomplish development tasks, I need to be able to run the code I'm working on and see that it works.

Are you actually suggesting that before I start a ticket, I should first do a quantitative analysis of the value of being able to do that? How?


  Imagine some incubating startup where you have an initial 4-week runway for development, with the goal of getting a prototype working
Are you talking about an actual prototype or an MVP? A prototype meant to showcase some idea doesn't need any unit tests. An MVP? probably yes.

Now back to the practicality of software development. Unit tests are fundamental when working in a team setting. What are people doing in PRs without unit tests?


> Imagine some incubating startup where you have an initial 4-week runway for development

Who have been asked to build something that does X, Y, and Z. Without tests how do they know when they're done? How do they convince other people that they've done what they were asked?


Absolutely agree. My biggest lightbulb moments with testing have come when needing to do significant refactoring. Without tests, I feel aimless because it's hard to get feedback about whether the refactoring has 'worked'. With tests there's a nice, tight feedback loop. Not saying it gives any guarantees, in my experience it has dramatically increased how quickly I could improve the code - to the extent where in some cases I wouldn't have even tried in the absence of tests because I wouldn't have been confident I could improve it.


Exactly. I picked up some legacy invoicing code that's 13 years old, and people are terrified of breaking it, so it's just quietly rotted away and half of it isn't even used anymore, but no-one can quite tell which half.

First thing I did was add tests so that I can refactor with confidence.


Agree with this and the parent comment. For the most part automated tests are for testing for regression. The highest ROI tests for complex projects are end-to-end in my experience - where the gateway interface is the API or UI - because they test an entire flow and can let you know if there is an issue somewhere, ideally with an error that can narrow it down to a specific function or file/module. It does seem like the author is talking about unit tests based on what he/she is saying which can imo have less ROI but they're still not useless. If you have limited resources (don't we all) then start with a developer working on end-to-end tests, it's further away from the code being tested but it's great 'bang for buck', and run them on an automatic schedule - each commit may be overkill because end-to-end tests can take a long time.

The second part of the article mentions that stale tests are often not removed. However if running tests using either CI or on a regular schedule and viewing the results everyday, this can't happen, they would be removed or modified quickly. This does raise a valid point that within our industry test results data is often not handled well and is an afterthought. My hypothesis is that this is actually the reason some developers get frustrated with tests or do not see the value or why you may get stale tests not being removed. The problem of poor management of test results data itself (the output from testing) rather than testing itself is the problem. A small plug here but I started tesults.com to address this very problem., integration takes a few minutes if you use a popular test framework. There's a forever use free tier so try it out, and if you need more but don't have or can't get budget but love what it does then send me an email and I'll expand your free tier. The important thing is to try to see if it makes thing better for you and you use it as part of your release management.


Are you defining unit tests are per class tests? This is the biggest mistake.

You should be testing a unit of behaviour, like a business rule or an effect to be expected. If this involves multiple objects collaborating then so be it. But they should not cross architectural boundaries like http calls, or database calls. If you can only make public api classes accessible, and make all helper classes inaccessible to users. That also forces you test things from only a public api standpoint.

Tests that just check an object calls another object, but exhibits no desired behaviour in its self are pointless, and just couple everything to the implementation. I.e Certain methods, are called in a certain order with specific parameters. When you do that, you make it really hard to change things(just design changes, not desired behaviour changes) without breaking every test.


This is part of the difference between the classicist and the mocking styles of testing. The argument for the latter goes that if (big if!) your collaborators are well designed and have good interfaces, then asserting that something is called is a meaningful assertion in the language of your domain (e.g. "check that an email is sent with exactly this content" is a meaningful assertion, even when you mock the mailer).

I agree that in many cases, APIs are badly designed and expose implementation details that you shouldn't couple your tests to, but many still do that. Mockist proponents would claim that this is not the right way to do it (for example, they argue you should wrap bad APIs in better ones and write your assertions against the latter, see also "don't mock what you don't own").

I can understand the point of view that even then you wouldn't like these kinds of tests, but for me they do provide some assurance that "the right stuff happens" (of course, you need integration tests to test the wiring etc.) at a still lower cost than "you have to extremely strictly separate pure from impure code" (which has other benefits, yes, but is also really hard to enforce especially in a team).


I mean I am a functional programmer so i naturally make code pure until the edges anyway. In a functional language I will just replace the edge function call with a fake version.

But the problem with mocking, is you couple your objects to one another. You expect certain methods to be called. With certain parameters.

What if you want to add in extra layer, or remove a layer, or add a helper. All your tests have now been broken yet the overall behaviour has not changed.

You also have issues around mocks not acutally setup to behave like the real object.

If your object under test sends a null it may pass but the real thing may fail.

Instead create an architectural boundry with a clear public interface. Exercise the interfaces in tests and then check for results on the otherside of the boundary. You might do that with a mock or a fake.

You are now free to refactor from the start to finish of that architectural boundry without breaking tests.

This is just fancy talk for only testing your public API. But defining those boundries and public interfaces is where the skill is.

If your building a web app using hexagonal architecture, I might say drive your primary ports from your tests, mock and fake your secondary ports.

If you expect some group of objects to used in multiple places like a library I will test those as if they're a boundry to. For example If I've built money exchange rate module.


> Instead create an architectural boundry with a clear public interface. Exercise the interfaces in tests and then check for results on the otherside of the boundary. You might do that with a mock or a fake.

I think mockist (test-doublist really) TDDers would suggest that the architectural boundaries derived from the design pressure are the "correct" public interfaces you're describing here, they just often happen to coincide with class (or your favourite languages equivalent) boundaries.

The pattern often ends up with many one-function role interfaces, orchestrated by collaborators down the dependency tree until you hit value structures, pure functions or external integrations at the edge of the system. In many ways, mockist TDD is a gateway to functional programming.


Hm, yes and no. I do feel that mockist TDD is still heavily emphasising the OOP idea (more in the original sense than in the Java sense) of having separate "collaborators" with their own internal state and side effects that exchange messages (in a way similar to the Actor pattern).

A functional approach (at least in a pure functional language) wouldn't necessarily emphasise this sort of interaction pattern between independent components and try to isolate state and side effects much more.


I agree this is what it is. But a lot people people don't understand it that way.

So I get rid of the existing already loaded terms, and just teach it directly without referencing it.


If you do functional programming, obviously many OOP techniques just don't apply. This is more of a difference in philosophy. I'm sympathetic to functional programming, and do try to use immutability, value objects, referential transparency, explicit state handling etc., whenever possible, but I'm still constrained by the languages, frameworks and teams that I work with, and I think many others are as well.

Of course, the question whether we should all just program in Haskell (or lisp, or erlang, ...) can be debated, but for a variety of reasons that is not currently the case, so I think mocks are still a valid answer for OOP, if (!) you use them correctly (and I agree that many may be too cavalier about mocking).

But to answer your question about your "extra layer": If your code is written in a domain driven style, then potentially adding in a new layer should be considered a change in behaviour, so changing the tests makes sense. If it's purely a technical thing, then there are IMHO often ways of not exposing them to surrounding code. As an example, if you're introducing some e.g. logging layer to your BillingService, instead of pasting in that layer in the original code that calls the billing_service, you could decorate your BillingService with a logging wrapper class and just change the injected dependency. Nothing about the tests using the billing_service would have to change. This is a stupid example, but I hope it gets the point across.


> What if you want to add in extra layer, or remove a layer, or add a helper.

You make the extra layer implement the same interface and delegate to the original object. Classes don't depend directly on other implementation classes in this style.


The extra layer will transform the objects somewhat, so you would have to change what the original object would accept.

You have now broken the tests against the original object.


I think the idea is to have small and well-designed interfaces so that you don't need to change them to insert a layer.


Absolutely. I've worked in places which enforced this decouple-everything-from-everything approach, and it just grinds work to a halt, makes test failures commonplace and uninformative.

I think there's a definite problem with terminology when it comes to testing. I've written about this before; the latest example being http://chriswarbo.net/blog/2020-07-07-more_testing_terminolo... (just submitted to HN https://news.ycombinator.com/item?id=23757346 )


How do you avoid external dependencies in this situation? If you're only testing the public interface, then you have to expose some kind of public dependency injection pattern. Now you've introduced a level of indirection into the production code specifically to make testing easier. This feels like an antipattern.

It's how I do it, but I'm not satisfied with it. It makes the code harder to read, I sometimes miss bugs because my injected mocks don't handle an edge case correctly, and the effort of maintaining the mocks themselves is non-trivial.


For most web applications I barely use mocks except on the edges.

For mocking out boundries like calling http service I need to call, I do the following.

I will write an integration test against the real object representing API. These tests will test the real thing.

I will then run those same tests against a fake object I've created which represents the API in tests.

Your fake and real objects are now guaranteed to work the same based on the properties you test.

I can now use the fake object in tests with a higher degree of confidence.

You can do the same thing around data access objects.


I agree, fake objects are a good alternative to mocks especially for complicated interfaces (databases, external services, etc.). But they require more setup, which is why I still think mocks can be useful in some situations.


They can require more setup in that you have to write an object to represent the thing.

They require less setup in that mocks require tons visual noise in setting them inside the test. Where as a fake will just be created with a standard constructor.

I hate a lot of mocking I see in real code. They go over board with the mocking and the test is 80% mock setup. Making it hard to see the real purpose of the test.

Tests should be short, simple, to the point, and easy to read. When most of the test is setting up a mock, you've lost that.

But there not too bad if it's a very simple one line setup. Which is how they should be used.

But that would not tell you that if you passed a null object into the real thing it would crash. Where as fake, with the tests will.


I think it really depends. If you keep the number of collaborators in a class small and your interfaces are well designed mocking is ok (and really just a limited form of a fake).

But sure, many people don't do that.


I don't agree that dependency injection is necessarily bad, but if you really want to avoid it/mocking, there are also other ways, i.e. structuring your code into pure and impure parts, only unit testing the pure stuff and relying on integration tests for the "wiring".


You’re talking about integration tests, not unit tests. Unit tests should focus on a single method. Integration tests can combine methods/objects/APIs/etc for business rules.

Your best bet is a combination of unit tests, integration tests, and e2e tests (there’s some old advice about a 70/20/10 split but this is pretty arbitrary).


Yes this what has spread around and what people think of as unit tests. But it an awful way of splitting tests. Code bases following this end up with brittle tests, that break at slightest design change with no functional change.

Here's a good article kent beck retweeted on the subject. https://twitter.com/KevlinHenney/status/1266383805520084993

I define unit tests as fast, and don't cross architectural boundaries, and operate on a well defined public interface. And test some kind actual property you care about. I don't follow "strict" rules like unit test per class, per function etc.

A lot classes just extract to some to some helper class, and they operate well together and the helper is not likely to be used anywhere else. Just make the helper class private and test as a unit against some actual desired property you want to test.

Integration tests is when you bring external things into the mix like databases, or http calls.


> just make the helper class private

Yes, if your class is small enough to not make sense to test, make it private.


If you want to make that argument, that's fine.

What I would encourage you to do is define the terms you're using, and don't assume that others are using certain words in the way that you mean them. I like that you've been explicit about your definitions here (although it may be better to describe rather than assert, e.g. "I treat unit tests as focusing on a single method", rather than "Unit tests should focus on a single method").

For what it's worth, I find the definitions you're using to be harmful (e.g. see http://chriswarbo.net/blog/2017-11-10-unit_testing_terminolo... and http://chriswarbo.net/blog/2020-07-07-more_testing_terminolo... )


You may enjoy this write up on testing taxonomy: https://blog.7mind.io/constructive-test-taxonomy.html


Indeed, especially having an axis extending from an individual function/method up to third-party services would presumably stop the "everything is a class" creep that's happened to so much testing terminology already.


There's more recent debate about what a "unit" is. The parent reply is not wrong. We're just using different definitions.


Small unit tests are valuable iff your small units are well designed and valuable. If they are brittle, temporary components they don't have much value, I agree, but writing mostly integration tests suffers from the combinatorial explosion problem and from making it hard to express succinctly what the important bit about a piece of code is (and besides, almost no test is 100% integrated, you always isolate something, even in full end-to-end tests).

If you have many functional, stateless components, the unit tests can be particularly valuable.

But I think many tests end up being badly written because people don't necessarily ask the fundamental question: "does this increase my confidence in the code?" I think that's the most important consideration when writing tests.


I agree that tests are an insurance against changes. They find regressions and are also lights in the darkness that points to where we have to work on.

I'm moving 3 db fields (let's call them a, b, c) from 3 tables (A, B, C) to a fourth one (D) right now. Adding a, b, c to D is easy, the data migration is also easy (a single update from a CTE coalescing a, b, c from the 3 tables -- the value in a wins over b and over c). And now let's see if anything breaks (it has to.) I run the test suite. Only a dozen errors. Some are where I expected them to happen, some are totally unexpected. There are parts of the code base I forgot about in the last year or never heard about (it's a team of about 5 developers.)

Without tests, no matter the language, typing system, etc any change like that would be a nightmare.


Not to downplay tests, but this would be caught by a typing system if all access to DB fields was through an ORM such that all access could be type checked.

Not to say that everyone should use an ORM, but this would be a very simple property for a type checker to catch if you let it.


Possibly, but (simplifying a lot) the actual refactoring involves JSONB fields that store serializations of objects which include those fields plus a yet unknown amount of code that might be using them, etc. And we're ending up redesigning part of the UI because 1) it makes sense and 2) it makes the migration to the new backend simpler. Luckily we're pretty well covered by tests.


I agree with you on integration tests being the most useful ones. The problem is that they are generally the most time consuming to write.


You can always start with black box testing so you check that bit of the program matches some expected input/output pairs, and then add more white box integration tests if the code is important/fiddly. It’s an easy way to get some validation on a particular subset of the system for a small amount of effort.


Aside from everything else, writing tests is just as much about writing testable code. Not everything that makes code easier to test makes code better, but often times many things do; after all, nothing is easier to test than a simple, small, pure function with inputs and outputs.

I prefer trying to focus on table driven testing where I can write a single test that exercises the code various ways. This is not applicable to all types of software, but it is wonderful for things like parsers, emitters, algorithms, data structures... things that test well.

Unit tests like this are cheap but make it easy to assert that the code works. If you can assert that your individual functions do what you expect, it makes debugging and understanding software easier.

I like to think of testing as executable debugging. It’s like a debugging session that is executed over and over again. If your tests are difficult to maintain it may say something about what they are asserting or what they are testing.


When people say "testable code" I always wince. Unit tests fucking suck at testing most code. Really suck. They're ok (ok, still not great) for things like parsers and algorithms... which isn't most code.

But, rather than improving test tools to handle more types of code, developers say that they want to completely restructure the code to make it more amenable to the fact that unit tests suck. Unit tests can't handle database? You need "Testable code"! No...

Developers code to a spec, but unit tests suck at encoding a spec except in the few cases you alluded to simply because most specs are not easily, clearly encoded in the form of your turing complete programming language with its mediocre tooling.

The practical upshot of this theoretical brain damage is that when people write unit tests on code where it isn't a suitable tool it tends to be an expensive waste of time. The tests cost time to build, time to maintain and when they fail it means... "oh, you changed some code". Thanks, test.

And then religious unit testing people always argue that it wasn't the tool that was at fault it was you.

Working on projects with excellent integration test tooling really opened my eyes to the possibilities of another world - one without unit tests.


I don’t have much to say beyond what I already did re “testable” code; forcing code to be “testable” can be bad, but in reality making code more testable probably mostly amounts to breaking it down into simpler, functionally pure bits.

Not everything unit tests terribly well. I haven’t seen it work very well for things like React or Angular components. It does work well for small bits of apps. Like for example, lets say you have a component where a good deal of what it does is extracts information out of a URL. You might have tests that exercise the entire UI, entering URLs and checking the HTML output.

The “more testable” version of this, imo, would be separating the URL parsing and extraction bit to a single free-standing routine that outputs some data given an input URL. You can then table test that bit. Then testing that this ties into the UI correctly could be done in the integration or end to end testing.

In case of table based unit testing, I think it often works great, as it can act as a running log of regressions and newly discovered edge cases that can even serve as a sort of document of expectations for other developers, and while it cannot be used to test all sorts of code, it has wide applicability and you can see it in webapps, Go servers, the Wine and libinput sources, etc.

It’s easy to get stuck on a single strategy to rule them all, but I think that often is a bit presumptuous and maybe dogmatic. Tests are a toolbox. Not every problem is a nail.


>in reality making code more testable probably mostly amounts to breaking it down into simpler, functionally pure bits.

I know. The problem is:

* This process often introduces bugs. How are you going to catch those bugs? Not with your tests, you're changing this code precisely so you can write tests. It's a catch 22.

* Sometimes people do this only to discover that the simpler "functionally" pure code is pointless to test because it's so trivially simple. Somebody literally did that today on the code base I work on. The code as a whole still has bugs but those tests won't ever catch one. They'll just break when the code changes. Plus that "refactoring" probably introduced bugs. This I think is what the concept of "unit test induced design damage" was getting at.

This isn't a problem with tests as a whole. Or TDD. It is partly a problem with people who use the terms "unit test" and "test" interchangeably (this engenders entirely the wrong kind of thinking). It's mostly a problem with unit testing as a concept (i.e. not the specific frameworks themselves).

Having nice clean code interfaces is also often conflated with unit testing - this is a mistake. One does not necessarily lead to the other.


Michael Feather's "Working Effectively with Legacy Code" goes into some detail on this. When you have bad, untested code, you don't start refactoring. You start writing high level "characterisation tests", then you refactor. After that you can still write the unit tests for the better components.


I've followed this process a few times and then noticed that the last step didn't really add much value.

It felt good at the time doing it because that's what I was "supposed" to do. I'd achieved the supposed "testable code Nirvana" and... meh.

The first step was life (or at least, career) changing though. Bringing a piece of shit code base under control with integration tests was a process that blew my mind.

That's what led me to start questioning the efficacy of jamming architectural changes into code in order to sacrifice at the altar of the unit testing gods and that maybe, just maybe, unit tests' steep demands and limited value means that they suck.


Unit tests are great, for testing unities.

What self-contained unities are there on your code? If you are doing low level system programming, I bet there are a lot of them. If you are doing high-level CRUD, I bet there is none, all of them you import from third parties.

Writing unit tests for non-self-contained code is crazy, and leads to all those problems people identify. In my experience, the problem is that the most vocal evangelist believers of unit test are all on places that work on the high level stuff.


You're talking about things like dependency injection. I'm not a huge fan of testing but "testable code" when written in a certain way (without dependency injection) actually has many external benefits that don't have to do with testing per se.

Really the way you need to structure your program is to divide functions in your code between things that can be unit tested and can't (IO calls).

This can easily be done without dependency injection which is likely what you're complaining about.

For example Don't do this:

    #unit testable with mock (Bad!)
    function add_one(key: str, database: Database):
         return database.get(key) + 1
Or.. even worse:

    #Unit testable with mock (Bad!)
    class Adder
       Adder(database) -> None {
          this.database = database
       }

       add_one(key: str) -> int {
          return this.database.get(key) + 1
       }
Do this instead:

      #IO function, Not unit testable
      def get_database_value(key, database) -> None:
          return database.get(key)

      #Unit testable without mock! Good!
      def add_one(value: int) -> int:
          return value + 1

      #IO function, not unit testable. 
      def composition(key: str) -> int:
          return add_one(get_database_value("name"))


You need a combination of unit tests and integration tests. Even trivial tests can help prevent someone ‘fixing’ your code in a refactor and breaking it, by defining the properties of what the code should do through tests. But it depends on the project - sometimes it just doesn’t matter, sometimes you need more formal V&V.


You need to think beyond the concept of unit tests and integration tests.

The future will have higher level and lower level executable specifications. The future won't have "unit tests" and "integration tests".


I work in a formal methods environment with proof checkers, property based testing, Isabelle/Coq/TLA+, and we still have unit tests and integration tests as concepts. You need a common language and these work well.


The distinction is easy. A unit test test whether a single unit works as expected. Generally, a failed unit test comes with a good pointer to what is wrong, and failed unit tests are rare.

Meanwhile integration tests test whether multiple units are integrated properly to produce expected results. When an integration test fails, you don't really know what is wrong. Instead you look at or write unit tests for the smaller units to see whether they fail, or whether the way you tied them together has a flaw.


The distinction is easy, yes. It's still important to think beyond the concept of testing.

For instance, there is no split - no separation of concerns between specification and execution in a unit test.


I don't quite get what you mean in the final sentence.

Are you saying your specification and your unit tests should be seen holistically?


I'm essentially saying that we should write specifications and a translation/implementation layer that "tests" the specification against reality. Maintain a separation of concerns.

For instance.

This is done in theory by cucumber/gherkin but it's done pretty badly. Good concept, bad implementation.

My broader point, though, is that unit tests are a bad concept and bad abstraction for this among many reasons but it's so embedded in programming culture (to the point that when people say "test" they automatically mean "unit test") that other approaches almost become "unthinkable".


I think you mean something like system testing, but backed by BDD-style specs?

TBH, I personally think you're being a bit too hard on unit tests - the number of times they reveal bugs in our our code bases (and in my own code - the horror!) is ridiculous. I do find them really valuable.

That said, testable OO code (I'm primarily a C# guy) has certain constraints, and sometimes making it testable results in a high level of abstraction - so much so that individual tests almost don't seem to actually test anything meaningful, and it becomes difficult to see where the logic and behaviour is, without digging through 42 layers of abstract classes, interfaces and factories.

Recently I've come to favour a kind of inverted test pyramid, where instead of unit tests forming the most substantial foundation, system tests do instead, followed by integration tests, and finally by unit tests at the tip. I find this leads to a "sensible" level of abstraction, where unit tests are used where they are most valuable, and system tests keep tests meaningful. Depending on your code, it might be quite tricky to setup systems tests, and it might need quite a large time investment, but IMO it's worth it. If you're able to dockerise your entire solution, then it's much easier to do.


I’m bearish on unit tests as well. Integration and e2e tests get you closer to the spec and are easier to change when the spec changes or your product inevitably evolves. I reach for unit tests when something is foundational or hairy, but IMO they should be a last resort. They fill your application like sand in gears otherwise.


I find I often have an integration test that is failing, and then I start adding unit tests until I find the error. Especially when the failure is not amenable to investigation by a debugger.


Have you had a chance to check out the Screenplay Pattern at all? I have had great success using Cucumber to stitch together common abilities/actions/question types. My step definitions usually consistent of a single line of words grouped by a few parentheses and commas. Groovy/Kotlin make this particularly nice.


You seem to be describing property based testing?


What do you see as good integration test tooling?


Something where the integration tests:

* Are clear and easy to read and form a specification of sorts.

* Which are cheap to build.

* Where the interactions the code has to the "outside world" are easy and cheap to mock/test/keep under control - e. e.g. time, database, browsers.

* Has good debugging tools such that it's easy to track down the source of the bugs.


I like to think of tests and the team's emotional relationship to them as a barometer of the extent to which the repo is and has consistently followed best practices in the language. IOW, enforcing unit and functional testing forces developers to follow best practices.

If I find myself dreading writing and maintaining tests, it's a great indicator that the code diverged from best practices at some point in the past, and that the technical debt has just been accumulating ever since. Difficult to test code is code smell telling me that the technical debt is getting out of control or that the code is getting unmaintainable, and this can be difficult to walk back.

Another great indicator is the extent to which the team "detests" mocking. Mocking is great -- when it's trivial to do. When it's not trivial, and elaborate mocking behavior and reflection and testing of private methods, etc. is needed, it's another indicator that best practices have been sacrificed for expediency.


Write a test for a simple identity function that simply returns its only argument.

An identity function should be as testable as they come.


If I really needed that I’d use property-based testing. On my phone or I’d code an example. But basically it generates test inputs and gives you confidence by running some number of random tests that verify the property holds or finds a counter example.

More realistic but basic example is to show that a function is its own inverse (reverse reverse list == list) or that one function is the other’s inverse (decode encode plain == plain).


In ScalaCheck:

    forAll() { (x: String) => identity(x) == x }


Unless you’re implementing a semi-group which can be a pretty common design in functional programming


String value = “postalrat”; Assert.assertEquals(value, identifyFunction(value));


  function identifyFunction(a) {
    if (a == 'nelsondev') return '';
    return a;
  }


Testning actively malicious code seems to me like a different problem than testing regular business logic. Yes, the code you're testing could also look for the running test code in memory and rewrite it to always return success, but why would you write code like that?


American Fuzzy Lop ( https://lcamtuf.coredump.cx/afl/ ) will detect that (at least, it will in C, where I tested it).


And that's where the developer writing the test would check the test branch coverage and see that an obvious branch has not been tested.


I see you are an Enterprise Developer From Hell https://fsharpforfunandprofit.com/posts/property-based-testi...


I'm not sure what your point is here - your implementation will fail the parent's test?


Their point is that the test case provided would lead the developer to believe that the implementation is a correct identity function, when in fact it isn't.

Just clarifying, I do not take this as a good reason to not write tests.


Ah, right. I didn't realize some people believe tests should "prove" the correctness of a function (or that if they can't, they're worthless).

Testing has its flaws, but that's not one of them. There's often very good reason to ask the question "does this function operate the way I expect it to when I provide this input?"


Why did you type this? (I'm pretty sure it doesn't make the point you think it makes.)


  def identify_function(a:str) -> str:
    if a is ‘postalrat’:
      return a


For Testing:

1. Tests aren't necessarily for you, they could be to convince somebody else that your solution is robust enough to be used for their use-cases.

2. Test's aren't necessarily for today. They could be about preventing code regression in the future (some dev comes in and makes "performance" changes for example, not realizing they are breaking the code for others).

3. They can be about testing that the code behaves how you believe it behaves. Sometimes even simple code requires a sanity check.

4. 100% test coverage is likely impossible, but if we do find errors, we can learn from them and try to prevent them happening in the future by adding them to the tests.

5. Tests breaking are a good thing. They inform you that some change you are making is changing the behavior of something you thought to be act reliably.

6. With regards to the same mind making the tests, this could also be a useful tool for ensuring a developer of some code thought of most the edge cases when merging some code. "Ah, i see you didn't add minus numbers to the tests, how are those handled?"

7. We should be able to freely rip out tests and add testing as the requirements of the code base change. Generally though, the goal of much code doesn't really change, despite perhaps how exactly it achieves the task does.


>100% test coverage is likely impossible, but if we do find errors, we can learn from them and try to prevent them happening in the future by adding them to the tests.

You do know there's a way to verify a function to a degree of 100% without writing a single test?

You do also realize that for even a trivial function f(x) = x + 2, the only way to achieve 100% coverage is to write:

    assert f(0) == 2
    assert f(1) == 3
    assert f(2) == 4
    assert f(3) == 5
    ....
    assert f(N) == N + 2 where N = infinite. 
There are infinite possible inputs and infinite possible outputs so to get 100% coverage you need to write infinite tests. Because your "tests" can never even approach this number most of your "coverage" really amounts to a number close to 0%.

The question is, why does testing seem to kinda sort of work even when our test coverage covers only an amount roughly equal to 0% of all inputs to the program?

It's an interesting question with an interesting answer. Suffice to say "test coverage" is a garbage statistic.

One way to look at testing is that you're taking a statistical sample of a population. You take a small sample and do a statistical measurement on that population and if all tests pass for a sample you assume that the correlations implies that the entire population of inputs will pass all tests.

So in a sense testing is just science. We try to establish correlations among a given observational data set and we assume that this is true the entire population of data including ones that aren't seen.

Except the above isn't actually true either...

The reason is most test writers don't randomly sample test input data. Their methodology is very divergent from the way a statistics expert gathers data. Test writers don't write functions that randomly pick a set of data to test... instead they're highly biased in the tests that they write... for example a typical test set can look like this:

   F(x) = 320302/x
   assert F(2) == 160151
   assert F(0) == error
As you can see the above two tests are highly biased with the second test picked in order to deliberately cause an error. Imagine if the test data was randomly picked? It would be highly that the random picker would draw zero as an input test case indicating that statistical sampling may not be the best way to test data....

So if our tests cases are biased then why do tests kind of sort of work? Or do they not actually work? How is the programmer picking a test and how does that influence the overall correctness of their program? Perhaps it's not the test itself or the amount of tests that the programmer is writing but it's more about how the tests influence the way the programmer thinks about the function...?

I would say the last micro service I wrote, (in python) was virtually 100% bug free ever since it went into prod. I also didn't write a single unit test for it. I did do some manual testing on the system but the application itself does not have a suite of unit tests to protect it and it has since had 0 bugs ever since it was placed into prod.

I would still say a testing suite is still good for new programmers diving into an unfamiliar system attempting to change things haphazardly, but in terms of correctness I question this strict almost religious adherence to unit testing.

Food for a thought.


> You do know there's a way to verify a function to a degree

> of 100% without writing a single test?

I carefully tried to avoid the word "function", as mathematically they tend to be well defined. As soon as you have anything even mildly complex or some element of randomness - suddenly the number of required tests to brute force the problem can explode.

> I would say the last micro service I wrote, (in python)

> was virtually 100% bug free ever since it went into prod.

There is never any bugs, until there is. Also there is quite some difference between code that is easy to reason about and code that is not.

> I would still say a testing suite is still good for new

> programmers diving into an unfamiliar system attempting to

> change things haphazardly, but in terms of correctness I

> question this strict almost religious adherence to unit

> testing.

I think relegating bugs to something only new programmers write is unfair.

I doubt you have a full compiler in your head or could even begin to consider all possible states of some code that could be considered complex. If your existing code is complex enough, chances are that you already have introduced some bug.

To be honest, I don't write high test coverage either for most projects, but I am sure to write tests for code that I either have trouble reasoning about or is of high enough complexity. It happens, even for veterans. Sometimes when pair programming I even spotted very seasoned programmers making such mistakes when they are tired.


Two harmful ideas I got out of university: Tests must absolutely cover as much code as possible and if something cannot be solved perfectly then any solution is wrong and useless.

I doubt that this was the intention or that anyone there teaching actually thought that, but the way things were taught to us we ended up thinking we had to test every single getter and setter. And that the traveling salesperson problem is essentially unsolvable because no efficient algorithm exists.

With a tiny bit more nuance you then find out that how much testing is useful depends on the domain/industry quite a lot, and that there are usually plenty of "good enough" solutions to seemingly impossible problems. Sweeping, extreme generalizations are for the inexperienced.

I love integration tests. You can specify your use cases right there in the code (maybe link to some official document), and often you are basically writing usage examples for your API so that someone new to the code can go straight to the tests to get a nice overview over how it's used. Regression tests can save you from looking like an idiot. I'm not going to pretend that testing is on the same level as something like formal verification, but as long as you don't overdo it I think it still has a lot of value.


It's important to remember that our own mental model is a very important part of any project. I've often read complaints about people tracking down a test failure and finding out it's actually a problem with the test and not the implementation, as if that were a waste of time. However, we need to ask the question: why did someone write that test, and do it in that way? If the answer is "because they didn't understand how the system works" then we've just exposed a bug in someone's mental model (often our own). It's not a waste of time to find and fix these failures (either by changing the test, now that we've improved our understanding of the system; or by removing it as superfluous). The article's example of checking for the round-robin algorithm rather than a lack of race conditions seems like one of these cases.

Of course, this isn't the only reason why supposedly-spurious test failures occur. If the answer to "why did someone write this test in this way?" is something like "to increase code coverage" or "they watched a talk that said to do things this way" or something equally silly then that's harder to spin as a positive. Hopefully it might at least expose failures in those development practices?

Unfortunately the developer who has to fix these failures is often someone who already understands, and wouldn't have written the test that way. Perhaps it's exposing a problem in their documentation, coding guidelines, etc.?


I couldn't tell if this is a joke or for real.

My 2 cents regarding testing. Written tests have 2 goals, 1: help to build your code (you verify your code as you write it). 2: Regression testing (new changes have less chances to break your code).

We as devs need to hand over/deploy a verified code. Without tests this means manual testing, without manual testing means not verified code. Manual testing means slow development process.

It's 2020, I think we have more time writing tests than before with all those cool IDEs, tools, frameworks. No tests means laziness and/or cockiness. Imho of course.


> you verify your code as you write it

That's what you do tests or no tests. Write some code, go to the browser, check it works (assuming it's a web app). I don't think even the most undisciplined of developers just writes code and assume that it works.

>Manual testing means slow development process.

I don't think its necessarily slower, in fact initially it is faster, which is why so many places don't have much in the way of tests. Much of this depends on the size and scope of your project.

I have worked on systems with lots of tests and systems with next to no tests. The tested systems were not "better", they still contained bugs and poor abstractions. The regression tests were useful, but the team required to maintain them was even bigger than the development team.

One thing that doesn't get discussed is that tests provide a way to run bits of code independent of the rest of the (probably over complicated) system. The last two places I have worked, getting a dev environment set up to run the application took a couple of days work at least.


> Write some code, go to the browser, check it works (assuming it's a web app).

That's a test. It's called manual testing, and in reasonable organizations there'd be a list of what test actions (as a checklist, often) should be done in this fashion.


The problems I see with "open your browser" for quick testing: 1. it's short gain, the team will be slower (will they know how to test it?) 2. how much code needs to be written so that you can even use the browser to test it?


> I don't think even the most undisciplined of developers just writes code and assume that it works.

What does 'works' mean? Implements this narrow bit of functionality / change? Or doesn't break all the other features it's piled on top of?

It's not a question so much of whether a developer thinks they're testing, so much as what they're trying to achieve.


The same as writing a test that "passes". It appears to do what is expected for the set of inputs given.


> help to build your code

Some tests should exist as scaffolding while you are working on a project and then you can throw them out afterwards.


One strategy that I'm really enjoying at a current gig: have the person writing the front-end write the end-to-end tests (or at least the specs) for the person writing the back-end. Just having a fresh set of eyes on the tests has flushed out a lot of edge cases, and tests make a perfect place to iron out small behavioral details. It really solves the "same mind writing the code and tests" issue.

Of course, this isn't always feasible (and is a little unfair to the frontend person in terms of workload).


Wasn't this the point of "real" testing/QA teams filled with SDETs? "Professional testers" were often able to do things like this and enjoyed it. The problem was that testers were often considered lower class and thus had a self-fulfilling prophecy of decreasing talent in most companies until they were eventually cut, but for the orgs that do still have QA teams, the main point is that they bring a fresh set of eyes and are focused on breaking things in ways the designers and engineers couldn't ever think of.


Well written tests make refactoring a breeze.

Poorly written tests make refactoring nigh impossible, and usually don't contribute to anything other than code coverage metrics.

Focus on quality, not quantity.


A good typing system and (mostly) functional code also gets you quite far with this.


Coverage analysis in particular gets to be overkill when you have a good type system and a good model.

But type systems do not help you with things like "do we gracefully fail when we got this error path?" or "do we actually drain the queue in all cases?".

Few things but contrived tests do, really.


Yes, they do.

And your second example is a very usual thing to do with them.

(The first isn't common, but is doable, it's common to use types to enforce that failure is handled, not that it fits some extra requisites.)


Well written tests make trivial refactoring a breeze.

High level refactoring means writing new tests. Which often kills the effort before you begin.


"Refactoring" normally refers to changing code's implementation without affecting its external behavior. By definition, refactoring should not mean needing to write new tests.


The line demarcating "external" and "internal" can be quite mobile, and moreover, context-dependent. Especially when refactoring.


No! Never change the behaviour and the implementation at the same time! It is the same for a class or function or monoid or whatever. API change and implementation change should never happen at the same time. If you covered the old behaviour with tests (on the appropriate granularity) you can refactor the IMPLEMENTATION and verify that the systems tested invariants hold buy running the tests. There are a lot of books written about this but really the main idea is this simple.


Could I ask what you mean by "external" and "internal" here? Could different parts of a codebase ever count as "external"?


I think they are referring to behaviour, not "parts of a codebase", as the GP comment used the phrase "external behavior".


I wouldn't make that assumption. For many people "external" means "in a different class to this one", for others "external" means some other process or service (e.g. a DB like MySQL).

It's a widespread problem that people can end up talking past each other, since they make assumptions about the definitions of terminology that other people are using.

I've written about this e.g. http://chriswarbo.net/blog/2017-11-10-unit_testing_terminolo... and http://chriswarbo.net/blog/2020-07-07-more_testing_terminolo...


You're right that terminology is fluid and people can talk past each other. But you have to draw the line somewhere. The original comment by fenomashas only two sentences, the subject of the first was "external behavior" and the second was on the same topic. The comment after that by invalidOrTaken, which you replied to, is also very short and was specifically asking about fenomashas's use of the word "external". We can be as confident as it is reasonably possible to be that invalidOrTaken therefore was asking about external behaviour, not code. (Incidentally, invalidOrTaken's comment was also about clarity of definition, but not the same difference you were questioning - at least if I understood you right!)

My own use of the equivocal language "I think" didn't help here. That was a bit of classic British understatement. But really, the two comments we're talking about are unambiguous.


fenomashas's comment seemed clear to me.

invalidOrTaken's claim that refactoring can affect what is internal versus external, or that it can be context-dependent, was surprising to me. I would consider changes to what's internal versus external as breaking changes; large changes could plausibly be called redesigns. Certainly not refactorings.

The context-dependence of internal/external makes sense, but that makes me think in terms of e.g. "MySQL is external to the application" versus "MySQL is inside the application's container", or even "MySQL is interal to the subnet where the application runs". Architecturally that makes sense, but I don't think it would affect the way I test things, or classify those tests.

On the other hand, invalidOrTaken might be using "external" to mean "other classes", in which case it's easy to imagine a refactoring changing all sorts of (class) boundaries. Likewise, the phrase "external behaviour" could simply mean "the public methods of a class".

I've certainly worked somewhere that claimed to follow "Behaviour Driven Development", where each "behaviour" was a micro-managed implementation detail of a particular method of a class, with a huge pile of mocks to simulate the behaviour of everything else.


What I was envisioning, though I'm sure the motivated reader could find other situations that are similar, was a refactor I was doing a week or so on an internal library.

The "external" interface, as in, the API presented by the library, mostly did not change. But internally blocks were broken apart, new ones formed, and logic ripped from one function and placed in another.

To someone seeing the whole, or "the library," changes were 99% internal (some changes in how options were interpreted).

But from the perspective of any one function, boundaries were crossed quite promiscuously, and if we'd had unit tests for them, we'd have needed to (re-)write some.

It's my experience that we are not very good at saying, "THESE are the chunks into which the program should be divided and thought of," and then never altering from that. There's always some cross-cutting concern that comes up. For this reason, I am suspicious of unit-testing-by-default, as "what is a unit" is often in flux.


> "Refactoring" normally refers to changing code's implementation without affecting its external behavior.

To verify this invariance, you have to test it. Specifically, and that means new tests.


Those tests for those invariants should already exist, if your already testing.


It just so happens that minor refactoring is far and away the larger use case. That seems like a pretty good win.


The problem is that at the time you write your stuff it's not always very clear what's well written and what's poorly written. I often encounter old code written by myself that seemed like a really good idea back then but looking at it now it was just bad.


Agreed. The rant to me reads as "Writing good tests is hard. Bad tests are not worth it."

That raises the question how valuable the skill of writing-good-tests is. For quality software or is. For a career unfortunately not in many companies.


I feel in a few years people will look back and laugh at all the useless tests we wrote.

I still haven't seen a test that can prove even an identity function is correct. Yet it's obvious to the programmer.

We lack the technology to properly test software. I like to think tests are a way to implement twice and hope you got it right at least once. But we aren't even to that point yet.


Most tests aren't intended to prove correctness - that is a very high bar. They're instead intended to offer some evidence that increases our level of confidence in correctness.

Techniques that are intended to prove correctness - like for example symbolic execution - easily handle the identity function (of course!).


I should add that even if you have the technology to prove correctness of program, that doesn't mean it's practical to prove correctness of all behavior of the program. I write real life programs in Agda (which is a dependently typed dialect of Haskell in which you can encode arbitrary proofs in type system) and you rarely prove all intended behavior. For example, recently I was writing a parser, and wanted to prove that for all languages I define, they're not empty languages (so that I don't accidentally construct empty languages by writing something like `between 'z' and 'a'` (silly example as this is easy to prevent)). So you start with basics, proving that the language that only contains empty string is not empty. Then you define what it means to be empty (empty is empty, empty+empty is empty etc...). Then you eventually need to prove L1+L2 is not empty if either L1 is not empty or L2 is not empty. Fine, just give a "witness" to each non-empty language so that we at least know these languages have 1 word in them. Then you have to prove things like "if string in L1 then string in L1+L2". Anyway, long story short, this one extra check I wanted to make, ended up being its own project that took maybe a week. It was definitely -- and I can't stress this enough -- definitely not worth the time.

Being engineers requires you to assess the risk, and if something is not worth checking, there is no reason to check it.


> I still haven't seen a test that can prove even an identity function is correct.

As with any hypothesis (empirical) testing, software testing is not about proving code is correct, but about striving to falsify that claim by poking at ways it would be likely to fail if it wasn't.

(There are formal analytical methods for proving code correct as opposed to empirical methods of falsifying it, as well, but those aren't tests.)


I wrote an LU decomposition function yesterday. I then wrote a test that tries to reconstitute the original matrix for the LU decomposition, for a few randomly generated matrices.

That is not 'implement twice and hope you get it right at least once'. It isn't a perfect proof of correctness either, but it gives much more guarantees than a simple double implementation.

Notably, when my code to solve a linear system using LU decomposition failed, I new I didn't need to worry about my decomposition, because those tests still passed.


The crucial difference between tests and implementations is that implementations have to satisfy all of the requirements at once, while tests can check a single thing at a time. As a simple example, if we need to test a 'sort' function for lists, we might write a whole bunch of tests like this:

    "Sorting" should "preserve length" in {
      forAll() { (l: List[Int]) => assert(sort(l).length == l.length) }
    }

    "Sorting" should "be idempotent" in {
      forAll() { (l: List[Int]) => assert(sort(l) == sort(sort(l))) }

    "Sorting" should "put elements in order" in {
      forAll() { (l: List[Int]) =>
        whenever(l.nonEmpty) {
          sort(l).foldLeft((true, l.min))({
            case ((result, max), elem) => (result && max <= elem, elem)
          }) match {
            case (result, max) => assert(result && max == l.max)
          }
        }
      }
    }

    "Sorting" should "not change elements" in {
      forAll() { (l: List[Int) => {
        val sorted = sort(l)
        assert(l.all(elem => sorted.contains(elem)))
        assert(sorted.all(elem => l.contains(elem)))
      }
    }
Each of these tests can focus on one aspect of sorting and ignore everything else. Each on its own is not enough to give us confidence in the 'sort' function (e.g. the identity function would pass the idempotence test, the same-length test and the same-elements test; returning an empty list would pass the elements-in-order test; etc.), however together they are pretty good.

In contrast, we can't implement the 'sort' function in such a piecemeal way. It needs to take every requirement into account, in case a step which satisfies one requirement invalidates the others. That's also why writing a whole new implementation to test against is best seen as a last resort.


Tests do not exist to prove correctness.

Tests exist to help prevent you from changing behavior unknowingly.


I'd argue that tests can also be a tool for improving your architecture (e.g. separation of concerns) and for communicating intend and usage.

Tests can be a tool for documenting expected behaviour, intended use, and edge cases. In contrast to written documentation, they have to be up-to-date by their very nature, which is an advantage over code comments or external docs.


> I like to think tests are a way to implement twice and hope you got it right at least once.

This to me is the biggest thing that makes writing good tests a "grind". For a simple, well-defined function, you can toss up a bunch of inputs and outputs and feel satisfied that all the corner cases work. For any moderately complex behavior, however, so far the only effective way I've found to test the behavior is to effectively duplicate the logic a second time in the testing code, and check to see that I got the same result; which is pretty discouraging when you're doing it.

It's certainly nice once it's done, though, to be able to make a change, re-run your test suite, and have a pretty reasonable confidence that nothing broke.


Imagine assembling a car engine without any QC process, perhaps just visual inspection of the parts. The end users (whoever will drive the car) will be the actual testers.


You jest, but I have been in a conversation where I proposed a user experience improvement and heard "but if we change the code, it will no longer be tested. Right now, users have reported bugs and we have fixed all of them."

Because users always report bugs...

If you've ever been frustrated by the package manager Conda and the UX of its command-line interface ...I'm sorry. I tried.


It's a trade off. Sure, the ultimate testing is done by customers, but how fast one can push new fixes without tests? 1 reported bug could break many things.


> but how fast one can push new fixes without tests?

Well, if you refuse to write tests and you are scared to make new changes without them... then you can't push new fixes at all. Which is apparently fine for some organizations.


Testing and documentation are both essential but too much of either becomes harmful.


Welcome to the land of ambiguity and subjectivity - enjoy our stay!

The problem is, that there's no way to generally quantify "too much" or "not enough" when it comes to tests and documentation.

Even "harmful" isn't a well defined term on its own and needs to be properly defined on a case-by-case basis.

The statement is a platitude without any substance behind it and you can replace "testing and documentation" with just about anything and come to the same result:

  Vitamines and calorie intake are both essential but too much of either becomes harmful.
How profound...


https://agilemanifesto.org/

> Individuals and interactions over processes and tools

> Working software over comprehensive documentation

> Customer collaboration over contract negotiation

> Responding to change over following a plan

The things on the right are only useful in service of the things on the left.


What do you suggest the alternative is?

Also, I agree that's exactly what testing is, in fact I often now test code like that, get someone else to do one simple implementation in Python, a second high-performance implementation in C++/Rust, and compare them.

Then, as long as the same bug doesn't occur in both versions (and that seems very unlikely, as they are implemented with different algorithms), we find all the bugs in both.


In general agree. Tests are needed, but not for the sake of the tests. You need to cover main hot paths. And thats why I prefer integration testing. And ideally automated one, for example using things like GoReplay https://goreplay.org/.


One benefit of testing I haven't seen much discussed yet is that testing can make fixing bugs easier. Many times in my career, there was a bug reported in a function I am not familiar with, based on a very particular set of conditions. There is a lot of work required to simply recreate the conditions that will allow me to debug. If I have written a test, where I have created appropriate fixtures and isolated the logic, I can easily recreate the failure condition and find where my logic is broken.


Personal experience is great, but facts are more important.

IBM team did 40% fewer defects than non-TDD team, Microsoft team: 60–90% (fewer defects than non-TDD fellows).

You can read more here: https://medium.com/crowdbotics/tdd-roi-is-test-driven-develo...


This part is enough to mark the test as exploratory and not usable for a guide on the real world:

> The experiment was conducted on “live” projects that the teams were developing at that moment.

There are plenty of experiments on those conditions. If they had consistent results, we could trust those, but they are all over the place, what means that confounding factors are more relevant than the thing being studied.


Did the non-TDD team do regular testing, or no testing at all? It doesn't say, and it seems rather important to judge the results..

I think the debate is not whether testing in general reduces defects (which is rather obvious), but if adhering to a stringent Test-Driven Development (TDD) process is better than regular testing (typically adding the tests after writing the code, instead of before).


You have to quote the cost too: "TDD teams spent 15–33% more time to write the code."


That's actually a surprisingly small bump.

The usual wisdom is that the later a problem is detected, the more it costs. 15-33% extra time for removing ~half defects very early is a great trade-off!


Starting your implementation with a test that's setting up, what the expected result is, and then implementing the controllers/whatever against that test is pretty satisfying.

In web development it also makes back-end development faster. Switching to a browser, reloading the page and inspecting the console/properties/page, is so slow.

I wasn't really into testing until a certain point, but having the confidence that your business logic still works after doing change requests is a bliss.

No testing it in the browser, no testing it with postman, no checking if mails are send in your dummy smtp server interface etc.

When you don't write software that changes very often or has many use cases, go ahead and leave them out, but the more you take on your shoulders, the better automating the tedious work.


Again, there is a misconception that test prove that a given module works. No, they don't.

Test may serve as some kind of "persistent REPL" to make sure you didn't miss some edge case. But, at least to me, the real value of tests is confidence for refactoring.

Otherwise it is so easy to "simplify" a function but break code due to some implicit type cast, or making it work only for positive values, etc.


I work on a codebase full of crappy unit tests that require more maintenance than the software itself. There's even a test that asserts the dockerfile matches a particular regex. It's made me realise that you should only encode your specifications in the tests, i.e check that all documented features work and that it fails gracefully outside this range. Some may not even be full end-end tests, you just have to be sensible and make them as high-level as feasible.


It seems to me that someone with this philosophy has never tried to change or refactor a subsystem of a very complicated legacy project. Sure, sometimes your tests fail because they're poorly written and test implementation details, but other times they catch the fact that you didn't know about case X that had been discovered years ago and required a minor non-obvious tweak in the code to make work correctly.


I kinda agree with the core issue here: Writing small, hermetic unit tests wont get you far, but unit tests themselves get you only so far. As Kent C Dodds puts it, Write few tests, mostly integration. That immediately sat with me. When u model tests based on how your software is supposed to interact, I see instant gains, i'm more confident about the code and have refactored parts of it as well.


The article doesn't mention refactoring, and that's where the value of tests lies.

Given a complete test suite and stable interfaces, you can make substantial changes to existing code and if you break anything you'll know. It's often the difference between making changes that need to be made, and avoiding those changes because there's no safety net.

I'd be curious how the author approaches refactoring.


> One way I like to consider this is to look back after a bug is found, and ask how many tests would have been needed to detect it. The minimal answer is of course one, but we can't reasonably believe that this exact test would be the next one added.

No, you can't. That's why when the bug occurs, you can add that test and then that bug never happens again.


I'm curious what kind of programs the author is writing (and not testing) here.

A few years ago, I implemented tests in a Django website I manage on the side, and am around 90%+ "coverage". Since then, I've been able to upgrade major/minor versions, add new features, and have deployed around 100 updates. Automated testing before deploying has caught numerous small typos and errors before anything went live.

On the other hand, I'm also writing some games right now, and those don't have any tests. I can't see as clear of a reason to implement them.

Would others agree that certain areas of dev are more suited for testing than others?


> Would others agree that certain areas of dev are more suited for testing than others?

I wouldn't agree at all with that. Code are rules written in way that can be translated by programs into a machine executable format.

Saying that code for some areas is more suitable for testing than code for other areas would imply that the rules are different somehow.

I don't see why that would be the case. I'd rather suggest that certain development processes and -methodologies (or lack thereof) may discourage systematic testing.

The rules that govern games are often expressed on a different level and by different people than the implementation. It's therefore hard to come up with reasonable assumptions to test in the first place.

Add to that the goals are often quite different (e.g. correct code vs the software doesn't crash and runs at an acceptable performance) and you may get the impression that tests just aren't well suited for say games.

In reality, the rules of the game, playability and fun need to be tested anyway. So from a cost-perspective it can make sense to skip out on automated code tests as errors and crashes would surface during gameplay tests anyway and overall correctness isn't a requirement for games anyhow.


Integration tests become difficult when the output is visual - such as a formatted PDF document or image. The best I've managed to do is store a known-good output and regression test against unexpected changes. I work in document composition, and it's very important to our clients that a random business rule does not break the overall look and feel of the document. Unit tests are easy, but don't help with that.

When even minor differences can completely alter text flow you have to basically invert the testing process. You run the program, use your human brain to decide if it looks good, then take that output as the 'good' output for automated testing. That's in contrast to a lot of testing where you can set up the expected results as you write the tests, before the code.

So for me, that definitely falls under the domain of being less suitable for testing. A lot of the usual tricks don't work, and the benefits are less certain.

Since it seems like it'd be very similar in practice, I'm curious how browser devs test the visual aspect (not the parsing or DOM) of browser/CSS rendering - how do they decide the graphical result shown is 'good'?


For CSS and the like, A/B-tests are the norm.

It's comparing output rendered to bitmap images and verified on a per-pixel basis.


A big one is that the game part of making a game requires you to test it by running the game and actually using it. My impression of other areas of development is that it's much less likely that they execute the code they write as close to an end user as that.

My other impression is that a lot of the unit testing advocates are coming from dynamic languages and need to runtime execute their code to find basic errors like typos. I know myself from writing in JS and Lua how much of a pain this is.

The last part is that gameplay code changes a lot, its very much experimentally driven. In my experience of following strict processes like TDD it becomes laborious, hurts iteration time and is error prone because rewriting tests often introduces bugs just by itself.

That said integration tests for bits and pieces of the game engine are gold.


I've unit tested the core domain of puzzle games and rpg games before. It's like testing business code. It's uniquely suited to that. But they can't tell you if some visual thing looks strange.

It seems like testing in games is a huge problem, in that game QA testing is a big industry. I bet puzzle games and rpg games could gain a huge competitive advantage by testing their core domain. If they're long lasting games with lots of changes coming up.


The author should put this into context of the language being used. In languages with poor separation of interface and implementation, and limited datastructure support, tests do indeed tend to be bad. That's less a fact about testing and more a fact about those languages.


Tests should fail often. We don't want tests to fail but we should!

We almost always write tests so that they pass - we should write tests so that if someone makes some change in the future that they fail easily.

Tests should be an aid, they should give you the heads up that something happened. They should be flags to help you not a game of achievement or failure.

We should call them traps. We make traps in code to trap bugs. Psychologically you can view them as games but avoid the negative connotations of failure.


I view tests as experiments: they compare our theory of how the code should work, against the reality of how our code actually works.

Writing tests for the "happy path" is like confirmation bias: we only go looking for things which reinforce what we already think. Good experiments should challenge our assumptions.

One good way to write tests is when we're debugging: the bug itself falsifies our theory, so we can use that as a test (AKA a regression test). This requires we can reliably reproduce the bug, but that's usually an important first step when debugging.

Once we've got our regression test, we might wonder how this situation arose. This is a helpful way to tease out the assumptions we're making about the code, and turn them into tests. For example, the buggy result might be calculated from some intermediate values 'foo' and 'bar', but the bug makes no sense because 'foo' and 'bar' always satisfy certain properties that would prevent the bug. Well there's two new tests we can write! We can keep working back like this until we find the cause of the bug and fix it. I like this method because we end up with tests that correspond to those features of the code that we found ourselves doubting. That's usually a good sign that those tests are worth having.


You cannot possibly avoid testing software, unless the software is literally useless.

So testing will definitely happen _at some point_. The debate is about where it happens: on the developer's computer, on a tester's computer, on a CI agent, or on the user's computer.

I like to cover as much code as possible before the software reaches the user, but that's just me.


I have to opposing thoughts on this subject.

I find that two many unit tests actually result in worse architected code as refactoring takes (at least) twice as long as you have to refactor the tests too. This can be mitigated a lot by seeing your tests also as code that needs to be architected well (shared code refactored out) but it is still a problem.

The flip side though is when you are writing code that is difficult or time consuming to get running. I am the lead for a telecomms system, the whole setup to get the code running requires hardware, signal generators and spectrum analysers. We have all that available on remote access but it can take time and fiddling for each code iteration. If I instead write automated tests at a couple of levels I can write a lot of code with confidence and then integrate with (i.e. test on) the hardware at the end with low risk of finding issues.


The article sounds very much like the author is writing tests after, so writing tests for the code already written.

I had very similar experience when writing tests after.

And it turned around completely when I started writing tests first.

Now the tests are not aiming at specific properties of the code, nor are they duplicating the thought patterns in the code. They are simply examples of what the code should do, and as such are documentation and formal/informal specification. Formal in that code is a formal system, informal in that they aren't actually trying to be a complete specification.

So if you're suffering the same symptoms as the author, I suggest you try test first, it's a world of difference.

(And of course all the comments about safety for refactoring, programming over time etc.)


There are probably hundreds of thousands of pages that have been published on this subject. Is there anything to be gained by reading these month-to-month anti-testing articles that consist of 10 paragraphs of enlightenment?


Sometimes people say things in different ways that resonate differently with different people and that is valuable. For example, you could have made this point without as much snark, or not commented at all - would anyone have noticed?


sure, but the comments are normally the same, "snark" or not, https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...


> Tests are very brittle, breaking due to entirely innocuous changes in the code because they inadvertently embed very specific implementation dependencies. Test for a race condition? Actually a test of the scheduler's round robin algorithm.

This matches my professional experience with writing and maintaining unit tests. They are often coded to the impl and are basically worthless IMO. I think you generally can get more bang for your buck with higher level functional tests, i.e., does this change preserve the functionality of the critical paths in an e2e flow?


Integration tests can outlive refactorings. Unit tests rarely do. Delete your unit tests after your integration tests pass. At least those unit tests that heavily rely on stubbing.


Property based testing would blow this persons mind, so many of these issues are resolved by them.


Everything in this article is correct except the conclusion that it's better not to write tests. Testing is an artform all its own. I find two tricks indispensable.

First I declare a "foo.src.js" file to hold a module's logic, and then a test file as "foo.js" and have the test file import the logic and then export whatever is to be public. Client code imports the test file and not the source file, so the tests sit in between the logic and the client code, as it were. No more balancing testability against encapsulation.

The second is indirection. I create a mockability decorator. Then have a function (I call it 'mock') which can be used to temporarily redefine any function declared mockable, and then switch back to the real thing when the current test is through. So I get:

    const readFile = mockable(fs.promises.readFile)

    ...

    mock(readFile)(async () => 'mock file contents')

Combining these approaches doesn't so much prevent rework on tests when code changes as it makes it much less trouble to implement tests for 98% of cases.


a missing link is that one specialized tester in a team every 5-8 devs makes devs a lot happier, makes test more robust and makes the whole squad more productive especially reducing regression over time by a lot, especially on fields where the code has to run on multiple platform and devs only ever use one.

it amazes me we're doing the opposite of the best practices by piling everything unto the mythical full stack dev, while the rest of the world moves toward specialization

this focus on forcing devs on writing tests and learning to test better is highly inefficient for all parties involved, both in term of output quality and time commitment.

and the first thing that gets cut when under time pressure is, in fact, testing. and a dedicated tester is resistant to that.

the real thing is.. almost nobody want to pay top money for a specialist so nobody wants to specialize in the profession.


Here’s one thing test driven development solves that has nothing to do with testing - empathetic code. Soon as you force people to write testable pieces of code, by definition you just forced them to write a good interface that considers how others will/could use it (in this case the ‘other’ is a test suite).

When that’s not in place, people often create selfish code where only their own needs are addressed.


Regression tests get around a lot of these drawbacks: they cover edge cases that have already been shown to have been overlooked in the past; and they're not unnecessarily brittle or testing things that might legitimately change because they're testing something that we know someone has relied on in the past.

They can still end up platform (or more generally, environment) specific, though.


Author writes 'In order to be effective, a test needs to exist for some condition not handled by the code.'

I think he is missing the point. In tests I write I capture what users do so I can gain some trust before releasing new code, that old still behaves the same way. I also often debug my new code by writing tests, especially on systems where I cannot just run my code against.


This sounds like the article should be against brittle testing.

If any of the following is true then you're writing brittle tests:

- Your test code looks like the code under test

- Your test suite includes examples that tweak one parameter and assert a result

- Your tests pass when you delete an arbitrary line of code from the implementation

I can't imagine shipping any significant project without any tests. How would I know that what I wrote implements my specifications faithfully? Hand waving and trust?

Unit tests are proof by example. They're trivial and don't prove the absence of errors. So I use them sparingly for simple, pure code where a few assumptions are enough.

Property tests are where I spend more of my time and focus. I generate the tests from a specification of a property I want to ensure will hold. It's not a formal proof that there are no errors but if my code survives 500 generated test cases and I have a good distribution over expected and hostile input I can be satisfied that my code is correct.

I don't spend time writing unit tests for effectful code. If I have business logic that is tied up calling a lot of APIs and digging into databases I defunctionalize the effect handling code and write interpreters for the data structures instead so that testing is still pure and easy and the effect handling code is constrained and found in one place. I test the pure version to makes sure the interpreter receives the correct sequence of data.

Sometimes I will use gold tests for serialization code. If I need to make sure that a contract I have with an external system isn't broken by my changes to the code I make sure to run the tests that will check what gets serialized out matches up with the golden examples.

If I need to ensure certain temporal properties hold like resource usage and performance... well I'll need regression and load tests.

These are all a part of the development process. You can ship code with no tests but good luck refactoring, maintaining, fixing, and understanding that code a year from now. It's possible, don't get me wrong, but it takes extreme discipline and even then that sometimes fails. People get tired, leave, get bored, etc. Having the computer check for you that everything is still working as expected is essential in my books.


I have seen too many badly written tests, so I tend to agree. Most tests will just fail on any change in the UUT, giving you lots of false positives. This ultimately results in fixing the tests instead of the UUT and blurring the initial intention of the test.

Writing testable code is not about dependency injection etc. It's about having the behaviour of a unit clearly defined. This can be done explicitly in text, but also implicitly by using the component in a certain way. If anything beyond that behaviour is tested, you start to defining the behavior in the testcase. This is fine, but often happens unintentionally leading to a very weird and very specific definition of the behaviour.

For example, I work on a component that generates a configuration. I know that the order of lines of the output doesn't matter and should not matter. But the initial author wrote tests that fail if the order changes, because they just check the output against a string.


Could I ask how are you defining your terms? For example, what counts as the "unit under test" (UUT) in your opinion? What would count as "behaviour"? What counts as a "component"?

I'm curious since I think there is a lot of cross-talk when it comes to the terminology around testing.


Here I used "UUT" and "component" synonymously, which is simply all the code that should be tested. With behaviour I was referring to expected output for a given input.

There is definitely a lot of cross-talk. Can you make an example where my previous comment would be misguided?


When I read your comment, my initial thought was that this sounds like a classic case of overly-isolated tests (e.g. mocking everything except for a single class, essentially re-implementing all those other components again and again). It's much less common to get spurious failures when testing at a higher level (what some might call "end to end tests", or "functional tests"; or even "integration tests", but that's a phrase I tend to reserve for interactions with third-party components).

When I read your final paragraph I realised that I've hit that same problem many times myself, e.g. checking if two things are equal, when in fact the order doesn't matter!


If you are implementing a specification (e.g. CSS/HTML/other language parser), tests are invaluable to assess how conformant your code is. They allow you to test the different parts and specifications in isolation (the lexer, the parser, the value/unit handling, etc.)

If you are implementing interfaces that plug into some other program (e.g. an IDE, or even different teams in a project), tests are invaluable to check that your code works as expected, and does not break when a new version of the program is released.

Yes, writing tests takes longer. Yes, writing tests makes it harder to change the code in the future. However, they have helped me identify cases in my code that I had not considered (when integrating my code with an IDE), and to prevent issues when refactoring and extending the code.

There are ways to mitigate the issues of changing the code, such as creating scaffolding to bridge the old and new code via adaptors, etc.


I mean this looks like the author is just trying to find a justification to not write unit tests because he doesn't like it.

I pick the last sentence "If you like writing tests and are very good at it, you may continue to do so." This basically means, well "I don't like it, I'm not very good at it so I won't do it".

What I want to answer to that is that if you are not good at it, spend time to get better. Everyone can learn what is a unit test in 5 mins, writing good tests and good testable code takes years. Do tests get deprecated and useless, are they hard to maintain? Yes that's right, this happens. So don't aim for any coverage percentage, if you aim for 80% coverage but the last 30% are a pain to write, a pain to maintain and don't bring a lot a value then don't write them. Focus on the rest, make sure the core functionalities are well covered.


A few things about where this guy is wrong on the definitions. For testing there are:

Unit Tests - This is usually per method and on a disconnected basis (you don't have a direct call to a resource on this)

Integrations - this is a combination of other unit tests. May use stubs and or mocks to isolate the external resources involved.

Functional/feature- This is where you get embedded services involved

Product acceptance test - This tends to test the individual application from deployment to running. This tends to verify outward features with live resources behind it. (Pipeline - it's one piece of the pipeline running live)

System test - this is where you can get live test environments setup and prepped for it. This tends to test the system as a whole (Pipeline- it's the entire pipeline in a test environment) Usual definitions are:

Smoke test - a briefer test of the System test.. but meant to prove out things work. Doesn't verify


I, 100% agree, tests should be automatly made.

When I am on a ruby console and I test manually, I should be able to create a test that will replicate the one done by hand. I should not have to learn a test framework.

For all for reasons I decided to only test real edge cases. Also I have now to create that magical gem that will generate automatly test.


The author imagines that the value of tests would be to detect errors the developer does not know about as they initially write the code. They are understandably skeptical that tests can provide that kind of value.

In my experience, the "error detecting" value that a test sometimes gives you only occurs when you revisit code. Preventing regressions is very nice, but this is still not the main value of tests.

The real benefit of writing tests comes when you write them before you write any production code. That way the tests give you feedback about your software design. Code that is difficult to test is also difficult to use correctly.

They also complain about the friction you feel when you want to change behavior, but tests require the existing behavior. In this case the test should be changed or deleted. People are often too hesitant to delete tests.


Writing tests help to design the code to be accessible and being modular. I test the vanilla paths of our software to ensure that refactorings don't break the business. I could go far deeper and explain every little detail but this is the most important thing about testing.


I very much agree with this sentiment, but it is difficult to endorse this line of reasoning without offering a better alternative. As folks have pointed out in other comments, there are other values to tests in the broad sense that the author doesn’t address. That said, if you are interested in improving the state of testing, I would encourage you to listen to Will Wilson’s excellent talk on the subject of Autonomous Testing: https://www.youtube.com/watch?v=fFSPwJFXVlw Indeed, FoundationDB (which Wilson worked on) has a very unique and sophisticated approach to testing that seems to have served them very well.


The title's a bit sensational. Yes, bad tests are bad. But nobody's entirely against testing in every situation.

Imagine that you're Microsoft, except you're against testing. You decide to refactor some Windows XP source a bit. What now? Just compile it and ship the result without even trying to boot it once?

Clearly that's unacceptable, so at least some testing must be okay in at least some scenarios. The question is how much and when. One or two black box tests of your program can get you a long way. Add more as needed to gain whatever degree of confidence is necessary. Remove tests that are redundant, fragile, or just exist reflexively because "code needs tests."


How does one even modify anything nontrivial without a test harness? You’d need to understand and remember every edge case in the whole system?

Take a nontrivial (but by no means enormous nor “messy”) system like the Roslyn C# compiler. It’s developed to a formal spec which is extremely rare, but it’s likeky almost impossible to add e.g a syntax change and foresee how that affects all scenarios. I can guarantee that any nontrivial change will break some tests which will force the author to iterate and solve the issues.

Most systems you encounter will be older, many will be much larger, messier, and not have the luxury of a spec that Roslyn has.


I worked on a legacy system that had no tests for years. Just as I left tests were slowly added. The project went from a painful quagmire to a maintainable code base in months. The transformation was unbelievable.


A general problem with writing tests as it is usually done now is that tests are written by those who wrote code being tested. Writing tests is a professional activity in its own right and requires special skills. (This is how companies like IBM used to work back in the day.) Sure, programmers want to do sanity checks on code they write, but this by no means should amount to testing in the proper sense, nor do you want to force coders to waste their expensive time on something that is not 1) generating ideas and 2) writing new code.


I always test till I find at least one bug. This approach has never failed me, and I usually find more than one. Sometimes its a corner case but still a condition the software should handle. I would really have to write a very tiny thing in order not to make a single bug!

Of course archiving the tests can be a pain if nothing is prepared for it. Automating the tests is yet another pain. Still I worked on a library once when I added quite a few automated tests, because the infra was basically there and just waiting for it.


Sounds like this is aimed mostly at unit tests, and the way we tend to roll testing into CI/D.

I prefer test harnesses and “monkey testing.”

I write about that, here: https://medium.com/chrismarshallny/testing-harness-vs-unit-4...

But I am REAL BIG on testing. Yeah, it ain’t fun, but I’ve been writing shipping software for most of my adult life, and, in my experience, about 60% of ship is boring stuff.


I used to be like this person, but then my testing suite got so gamified that I really loved getting all the checkmarks on files, all the functions covered, writing functions in ways they could be covered, and moving the percentage to max coverage!

I also really did become a better programmer from them too. I used to think I knew what all the functions did and their requirements, but tests would often show me how a case a failed.

And of course if anything changes the test reveals this too.


This is why integration tests are more valuable than unit tests. Integration tests directly test the desired output, no need to think about what to test. Integration tests flag if something is wrong, unit tests help refine which part is wrong. If all integration tests are consistently passing I honestly do not see the value in having to write unit tests as well (or at the very least they have lower priority).


While it is true that the author is saying, tests provide a mechanism to check if your code behaves in a different way than before if some tests suddenly stop working. That is different than a proof that you software doesn't contain bugs.

I would love to have a decent solutions for unit tests on embedded systems with "bare-metal" firmware. There are some approaches, but the situation is certainly not optimal.


“There's some amount of discomfort I'd be willing to sustain if I felt it they were beneficial, but I also find they're rarely worth the bother.”

Also sums up my feeling of reading this author.

I only read the first two paragraphs and determined this article needs unit and integration tests.

Assert sentences:

  Have noun, verb, and subject

  Are not run-ons

  Do not contain typos that render them unintelligible


I only write broad blackbox tests that make the same actions as users and look at the end result (e.g. has the DB been updated accordingly).


while agreeing with almost all points here (hackernews never let me down), I would like to point out that tests give specially newcomers the confidence to start to contribute as soon as the development environment is up and running. So yes tests not only helps us avoiding regressions, but it ramps up dramatically the on-boarding of new developers.


From my experience, easily testable code (this means applications and interfaces) is stabler and easier to maintain.

By testable code I mean code that's either easy to test or write tests for. I see a lot of tests retrofitted assuming that the code must not change. So my thumb rule is if it's hard to test then re-design it.


Any successful project I participated (tons) sacrificed tests writing, still used health checks. Any BS project I participated (+/- the same amount) was full of linters, TDD, and lots of testing bureaucracy and disoriented people which were thinking all is good because there's 100% test coverage.


The amount of test code in your app is (well, should be anyway) directly proportional to the risks inherent in your process plus the lack of verification tooling in your platform (type system, preconditions, etc). When you consider them this way, they’re basically a code smell.


It doesn't need to be perfect to be perfectly useful.

Not testing because it can't be perfect is a fallacy.


I believe the fallacy is believing that since some testing is good more must be better.


I really like that. I've seen code so full of trivial unit tests, yet no integration or regression testing.

A good unit test should be written with the mindset of a hacker - how can I break it - rather than just a few assertions. It's not easy to do, and that's why there's so much bad stuff.


In my experience the best tests have the structure of a specification. Some flavor of test-first discipline is the easiest way I know of to achieve this. This way when requirements change, you change the tests such that they match the new specification. Sometimes this reveals that large chunks of the code need to be thrown out and replaced, but that's OK.


> "Certainly, one approach is to just write good tests that don't have these problems. If you like writing tests and are very good at it, you may continue to do so. "

The article should be renamed to "Against Bad Testing".

Good tests do exist ... but they are hard to write (but easy to maintain). They require a lot more software design competence that writing non-testable production-code-that-works-today-and-maybe-tomorrow. An example of good tests: have a look at the Ninja codebase.

Writing such good tests for a legacy codebase (not written with testability in mind) is even harder (and I'm not sure that it's always possible in this case. It might. But I don't have any example).

Forcing everyone to write tests when they don't have the competence yet is a recipe for disaster and false conclusions ("this testing thing doesn't work / is harmful").

PS: now that I think of it, do we have a list of well-tested public-source codebases? this could be a good pedagogical tool.


For me testing is a way to run small snippets of code without the need to spin up the whole application. I write lot of unit tests(almost TDD) when working with java, but when working with clojure or python I mainly use the REPL to drive the code.


I honestly don't get the fret. I must be one of the few people in the industry that actually TDD. I write the test first as a sort of "design research" phase. This industry pays, is it that hard to expect discipline?


The standard book on avoiding the test anti-patterns described here is http://www.growing-object-oriented-software.com/ .


Testing arose with the advent of PHP and JavaScript. Because compilers check nothing in these languages, testing must be written separately, otherwise it's impossible to make even a short program without errors.


For the past few months I've been writing property tests with Hypothesis for Python (https://hypothesis.readthedocs.io/en/latest/) and Proptest for Rust (https://altsysrq.github.io/proptest-book/intro.html). The tl;dr on property testing is that, rather than testing just one manually solved case, you describe an invariant (the example given of x + y < limit being a good one) and a method of generating random inputs, and the test runner will use that information to fuzz your code.

I've found that I get better tests, and that I can write tests faster. It's also an antidote to Unangst's point that the developer who failed to imagine an test case while programming won't necessarily have an epiphany when they go to write tests.

But it isn't always a good fit with one's domain.


Can someone give some examples of well-unit-tested open source projects?

It seems like this should be googleable, but all I get when I try to find this is unit test frameworks and blog posts about how to unit test.


[SQLite](https://sqlite.org/testing.html) so many times over


Shellcheck https://github.com/koalaman/shellcheck has excellent functional unit tests


A bunch of GraphQL projects are quite well-tested e.g. the reference impl at https://github.com/graphql/graphql-js


Never finish your essays all wishy washy like that. You don’t like shitty tests. Not a very controversial or enlightening end there.

If you’re gonna be against testing, then be against it


I didn't even make it past the first sentence. What a lame opinion! Tests are provably great, especially if you want contributors to your project, or give even half a hoot about documenting behaviour. I could care less if you don't like writing them. Just do it.


I just dislike tests that don't test what's actually important.

For instance, if you are writing an application, then you probably don't need to be writing unit tests for internal APIs, since they might as well be considered private entities. Just write tests for application behavior. This stance seems rare, in my experience. Every team I've worked with insists on testing effectively private logic as well as testing the application at a high level, which takes a lot of time.

A lot of debugging time can be saved by guarding against unusual circumstances and providing useful error messages. Unfortunately, most people treat tests as if they are the documentation for how parts of the application are supposed to behave, which I think is generally the wrong way to look at it since tests can be difficult to decipher when you dive back in to them.

The greatest sin of testing, in my opinion, is the idea that if you write enough tests that you can avoid errors. The only way in which this works in some capacity is when you write your tests while you code(aka TDD). But the reality is that you simply can't avoid bugs no matter how meticulous you are in writing your tests. Every application I've worked on has ass loads of tests, and yet there are regressions every week. This is not the fault of anyone in particular, but the nature of the beast. In which case, you've got to decide whether it's worth testing all the minutiae of your application, or spend more time on the stuff that you really care about. Every test you add contributes to wasted developer time, especially when it gets to the point that it's no longer practical for programmers to run your entire test suite locally.

EDIT: To expand upon this, while I think integration tests are better than unit tests, generally speaking, I don't think they a good substitute for high level application tests when writing an application(not an API, framework, or library).

Your best bet for testing the truly desired behavior, getting an understanding of how performant your application is, and measuring your app's complexity, is application tests. If your application test requires a ton of ridiculous special cases to be set up, or faking test data becomes difficult, that's a sign that your app is too complicated and that you should stop and address that before anything else. If your application tests are becoming slow, that means that the application will become slower for your users. Integration and unit tests are unlikely to capture how your app is actually going to behave or perform. If you TDD your application while writing application tests, you will quickly know whether your work is improving the app or making it worse.

Appliction tests, by design, are much slower than low-level tests. This is a good thing, because you'd better make good choices or else your test are going to take forever to run. Things like unit tests sweep performance problems under the rug.

The problem with good application tests is that adding them to an existing app that performs poorly takes a tremendous amount of effort. If your app is well established, has tons of lower-level tests, but runs like crap and has a bunch of over-complicated inner workings, then getting new application tests in there will be painful and might not even be worth the effort to the business. Sadly, so many apps I've worked on faced a great deal of rot, which I think is largely a result of flawed testing philosophy, that made it very difficult to improve.


> then you probably don't need to be writing unit tests for internal APIs, since they might as well be considered private entities.

The reason to write a unit test _is to try to find bugs._

If your most effective way to find bugs is to test an internal API, then you should do that.


I sincerely hope that the author of this article is not employed in a professional software capacity, or at least not in any software I rely on. As someone who was raised in the peak of the Kent Beck test-first philosophy, reading something like this is shocking to me.

Based on the declining quality of everyday software it's not altogether surprising, but it's disturbing to see it written in black and white so brazenly.


Dunno what he does now, but he worked quite a bit in static analysis, which does a much better job at discovering nasty code properties than tests do.

He found issues in code you most likely rely on, like the Linux kernel : https://linux.kernel.narkive.com/Cw4EtIti/coverity-untrusted..., and also the pretty important CVE-2010-5298 in OpenSSL.


> I was under the impression that kmalloc has an implicit bounds check as it returns null if attempting to allocate >64kb (or at least it used to). Can someone confirm/disconfirm that?

If only it had a unit test...


Don't you think it is strange that "declining quality of everyday software" is observed at the same time as testing becomes the norm?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: