Hacker News new | past | comments | ask | show | jobs | submit login
Write tests. Not too many. Mostly integration (2017) (kentcdodds.com)
331 points by AlexanderDhoore 5 months ago | hide | past | web | favorite | 152 comments

My rule of thumb: tests cast away fear. Whenever I get that sinking feeling that I'll break things when I change the code, I write tests until my doubts disappear. It works every time.

In response to the article: it's true that "you should very rarely have to change tests when you refactor code", however, most of the time, most coders are changing the expected behavior of the code due to changed or added requirements, which is not refactoring. Tests should change when the requirements change, of course. I am not contradicting the article, only clarifying that the quoted statement does not apply outside refactoring. (Refactoring is improving the design or performance of code without changing its required behavior.)

In my experience, a lot of unit tests are written with mocks, expectations, and way too much knowledge of the implementation details, which leads to broken tests simply by refactoring, even if the behavior does not change at all.

If you test units in isolation, while adhering to SRP which results in many smaller units depending on eachother to do a task, then simply refactoring without changing behavior screws up a considerable portion of your tests.

As for "tests cast away fear", that is definitely true. Whether or not the lack of fear is warranted is something else, and depends heavily on the quality of the unit tests. I've seen plenty of devs confident of their change because it didn't break any unit tests, only to discover that it broke something they forgot to test.

Most 100% code coverage unit tests break liskov substitution / interface boundaries.

But I have seen how really deep unit tests enable rapid deployment of code to prod since it produces high likelihood of correctness.

However, refactoring and the LoC bloat is also gigantic.

And we still needed integration tests, and forget about unit tests if there is a lot of network boundaries and concurrency. It might help, but it starts to fail quickly. The facades/mocks just assume too much.

If you are in java, for god's sake use Spock/groovy even if your mainline code is java.

In my experience, if you are baking in implementation details into your unit tests, they are necessarily going to be fragile. It is best to focus on input and expected output rather than the inner workings of the function under test.

It is analogous to moral hazard. People with insurance take greater risks, and vice versa.

> It is analogous to moral hazard. People with insurance take greater risks, and vice versa.

Programming with guard rails :)

I find it helps when I consider the tests to be part of the code. If I need to change existing functionality, of course I'll need to change the tests so they test the new requirement(s). If I'm adding new functionality I add new tests to test the new requirement, but I shouldn't break any of the old tests when I do that. If I'm changing code with no requirements changes (as in, pure refactoring, not "tidying up as part of new feature development"), all the existing tests need to pass unchanged...

I always see unit tests as a snapshot of dynamic behavior by 'recording' the tested logic. I make sure I'm aware of any logic changes since that change could eventually break the desired result.

Whenever I'm vetting in-/output values to a given set of parameters, I do move from white box (unit tests) to grey box testing.

When I'm done with grey/white box tests, I do make sure integration works as expected.

Why all the hustle of moving through 'the onion'? I wanna make sure to detect misbehaving/unexpected logic as quickly as possible. Searching for malfunction detected while running integration tests takes way more time than already catching them at the onion's most inner layer (unit tests).

When the result of the function under test is nontrivial, but JSONable, I make it literally a recording: set your favorite JSON tool to pretty print and compare the output to the expectation in a string literal or file. Any change will be easy to inspect, diff viewers are great. The expected counterpart may or may not start out to be manually authored. After changes, it's usually much easier to inspect the diff and copy the approved version or edit a copy of partially correct output.

Yes, I do exactly the same to make sure the output stays the same. Snapshotting is awesome (for go, cupaloy works quite well [1] - jest for react is great) but does not give me the chance to 'record' the underlying logic dynamics. That's why I think unit tests are still an important tool at hand.

[1] https://github.com/bradleyjkemp/cupaloy

I prefer to speak about "confidence" but I agree with this point quite a bit, if I am making changes and am not certain if my changes will cause breakages I'll manually test things and codify those manual tests are integration/unit tests so that I never need to write them again. Then in the future I can modify code in the same neighborhood with confidence that any breakages I'd cause would be caught by my tests - add in a willingness to liberally add regression tests for any errors that do make it through and I think this approach can really decrease the labour required to make changes, but it does front-load more cost.

Strict static types also go a really long way doing this + (And I'd argue this is more cost effective than tests)

I agree static types can help, but there are many kinds of errors not caught by static typing. Did you misspell a key while encoding to JSON? Did you add two values where you should have subtracted? Did you forget to log an error (and did you log everything you should)? These are the kinds of errors I face every day and it would be burdensome to prevent them with static typing alone.

These days I can't imagine writing any kind of secure software without writing tests, regardless of static typing. Static typing does not increase my confidence in code very much.

Programming language tools help get rid of classes of bugs. GC languages help get rid of or reduce significantly classic security bugs such as buffer overflows, use after free and so on.

Static types get rid of other kinds of bugs and unit tests you don't have to write as a result. It, of course, does not solve everything or remove the need for tests.

Rust's borrowing system help get rid of multithreaded data race bugs, something traditionally hard to write any sort of fast unit tests for.

Other static analysis tools also help you not write tests by testing certain things on everything. Even linting is another kind of automated meta-testing tool.

Also for your json 'string typing' example, that is a hint maybe you should use statically defined models instead of relying on unit tests to catch typo bugs?

Not saying you are wrong, but I think your examples are all a bit interesting (in a good way):

Misspelling a key while encoding to JSON: In a system with JSON in it, the JSON part of the system I would consider to be a part with dynamic types. So, that errors can be introduced in a part of the system that is using something resembling dynamic types can be seen as the result of moving away from static types.

Adding two values where you should have subtracted: In general, this can't be detected by static typing, but there are also examples where it can (for example - try adding two pointers in C - this is a type error because only subtraction is supported)

Forgetting to log an error: Personally, I don't see this as practical at the moment, but if the popularity of rust means that linear typing because a bit more "normal" to people, then having errors where it is a type error to not use them is a possibility.

Personally, I'm a believer in using languages with simple static type systems, but pretending they have fancier type systems by the way of comments and asserts. Sure, a compiler can't check the invariant you mentioned in your comment, but with the language tech we have at the moment (and I would guess for the foreseeable future) a typical human can't understand a moderately complicated invariant that a compiler can check.

Could you replace/change your implementation bugs into types bugs?

If you are parsing known JSON schemas, defining them as types instead of generic dictionary lookups makes this a type-error, not a implementation error.

I can’t see why anyone would want not to do this.

Genuine question - how would I do this in VB.net? Is is as painful as I imagine, with a lot of classes?

I don’t think that’s painful at all. Yes you define it as a root class, with dependent classes as needed. This is the contract. It needs to be defined somehow, and IMO this is as good a way as any.

From there on you just use JSON.NET to convert your JSON-string to an instance of the root type, literally one line of code.

And after that you get code-completion and type-inference and compile-time checking for all data-object access.

It’s an absolute no-brainer.

Thanks! I need to look into that.

Out of curiosity - do you write asserts?

Of course. :-)

Static types do complement testing quite nicely; and more involved sorts of verification, for those prepared to do it.

> I'd argue this is more cost effective than tests

It depends on when the types are introduced.

Choosing to write a project in a strong statically typed language might be quite low cost. This often depends on library availability, e.g. pick a language like Python and there might be off-the-shelf libraries to do most of the heavy lifting; pick, say, StandardML and this will probably need to be written in-house.

Trying to add static types to an existing project might be quite difficult, and may give us hardly any confidence in the code. For example, if we're adding optional/gradual types to our code, but that information keeps getting lost when passing data through an untyped third-party library.

It seems like a lot of the logic behind "tests shouldn't break due to a refactor" presumes that tests only test the publicly facing endpoints into the code. The tests that do the best job of reassuring me are tests against code in utility functions and the like. It's hardly a refactor if none of that changes.

However if you test too much of your "internal" code, you just end up cementing the currently implemented logic.

I often see tests that are basically checking if the code didn't change. I.e. checking if function calls (on injected (mock) objects) have been done in the exact specific order, instead of checking for some correct end result.

If you have a user facing component that uses utility functions, let's say a component that shows a table with a sum (that uses a utility function). Then refactoring your sum function should not break your tests of the table component.

I'm not saying you shouldn't test your sum function using unit tests, but ultimately users don't care so much about the sum function, they care that they have a table, and it should show a sum, and so that functionality should be tested.

Users care if the table is broken because it's taking too long to find the bug from endpoint-only tests

Your endpoint test would presumably make sure the table is working.

That's what I'm saying. But it only tells you that it is broken, not why. Users don't use tests, tests are for the developers.

The level of experience of the people writing (and maintaining) the code is also a factor I think. As other commenters have said it's all about risk reduction. I definitely agree that you can get a lot of value writing integration tests. At the same time I'm a slightly concerned that if I'd read this article when I was first starting out as a developer I'd have thought unit tests were a side-note or a chore, rather than the building blocks for an application.

Doesn't a lot of refactoring involve change in responsibility between classes? At least I find that a lot in my code. In those cases, the interface of those classes might change (as the responsibility shifts elsewhere), which will of course cause test changes.

My understanding is that refactoring may require changes to unit tests, but not integration tests. On the other hand, there have been long, unfruitful Internet discussions about the precise definition of refactoring, and I find it's better to just find an agreement among your immediate peers. :-)

There are some who treat the phrase "public interface" as meaning the parts of a class which use the `public` keyword.

There are other people who treat the phrase "public interface" as meaning whichever way that users interact with the system, e.g. commandline arguments, HTTP request parameters, exposed modules of a library, etc.

Sounds like you're using the first terminology and others are using the second.

(I wrote about this at http://chriswarbo.net/blog/2017-11-10-unit_testing_terminolo... )

Why are you testing how the responsibility is divided between classes?

Yes, some times you have to make an unclean cut into your code to test something. But this is not the default situation. That's why the article has that "mostly integration" part on its title.

I think his argument was that unit tests don't hold up to "real-world" refactorings, because most of the time you change the interface of your classes. Unit tests work nicely if you have one fixed interface and you just refactor some specific implementation of that.

That’s a nice way to put it.

> I’ve heard managers and teams mandating 100% code coverage for applications. That’s a really bad idea

I hope many managers and programmers out there don't take this the wrong way. I've been an engineer on a project that was attempting to get 100% code coverage on a piece of software I was writing. I heard constant remarks during this period that were similar to "You don't need 100% code coverage, it doesn't do anything!" These engineers who I was working with had only read articles like this and didn't stop to think about what the article was trying to say. From my experience there is no safe rule of thumb for for how many tests should be implemented for a project. There should be just enough to feel safe (as hathawsh has said). If you're recommending to engineers on your team to stop implementing tests when they say "I'm at 60% coverage and it'll take ~2 days to get to 100%" I'd really hope you take the time to understand why they want 100% coverage.

The software I was working on when my coworkers started telling me to stop writing tests was code designed to trigger alarms when patient's vital signs met certain criteria set by doctors. I am very thankful that I did hit 100% coverage because between 60% and 100% there were many small edge cases that, had they caused a death, I wouldn't have been able to sleep well. Had they said I was explicitly disallowed from working on the tests I would have come in on a weekend (or taking a PTO) and implemented them then. It's our ethical responsibility to know when and where paranoia is worth the marginal time penalty.

> ...trigger alarms when patient's vital signs met certain criteria set by doctor

Most of us aren't working on life or death code like this. My React app doesn't need 100% code coverage but you put it well when you said "There should be just enough to feel safe"

What I really want to drive home is that you should listen to the engineers who are working on the project. Testing is one of those things that is entirely subject to project needs. Like you say, a React app doesn't need 100% coverage, neither does code running a CI, something far out of the serving path, etc. But we shouldn't have a knee-jerk reaction to an engineer saying we should have 100% coverage.

> between 60% and 100% there were many small edge cases that, had they caused a death, I wouldn't have been able to sleep well.

But you're not saying the right words here. 100% doesn't mean you've covered every edge case (although I suppose 60% necessarily means you _haven't_). I can hit 100% without actually asserting anything.

I think it's harmful to talk about 100% without also considering mutation testing, boundary analysis, manual mutation testing...

If the code was open source I would show it to you. There are definitely cases that weren't covered during the initial creation of the service that came up in QA/prod. The entire process was a war story in and of itself. The tests aren't perfect but I have done everything in my power to make sure they cover all of the use cases that appear, samples of bad data from production, samples of bad data I forced to happen in other services, asserting failure cases, and (most importantly) are vigilantly kept up to date. I also, on my own initiative, found people internally to help run a small QA study where they interacted with this system on a daily basis for about 2 months.

The senior engineers were not supportive of other forms of testing in any way (even the user testing). They flat out refused my proposal to attempt to create an integration test suite to cover it's communication to other services as well as other services consuming it's data (this I started over the weekend and was told to proceed no longer).

tl;dr: You're absolutely correct but I still think my testing procedures were better than nothing.

>I still think my testing procedures were better than nothing.

I am sure they were, but you also said:

>take the time to understand why they want 100% coverage

which to me indicates that they are under the false impression that this gives some sort of completeness, i.e. "make sure this plastic fence covers the entire stretch of mountain road".

We have a coverage threshold of 90% and it is helpful because it works as a reminder if you try to check in code that isn't covered by tests. I think that pushing that to 100% would add little value but a lot of overhead. Never tried it though so I fully admit that it's just my hunch.

I feel for you. I argued for unit testing for ten years before they were instituted and even then it was half hearted.

It's our ethical responsibility to know when and where paranoia is worth the marginal time penalty.

Exactly. You are in the tiny minority writing literally life-and-death code, and for which a lot of other common advice given regarding general software development likely does not apply either (like "move fast and break things".) Also, for your type of application I would probably want 100% state space coverage too.

My problem with "100% coverage" is Goodhart's law ( https://en.wikipedia.org/wiki/Goodhart%27s_law ).

Keep adding tests as long as they tell us something new, give us more confidence, document some regression, etc.

Keep adding tests if they're exposing edge cases (e.g. MAX_INT), even if those code paths are already covered.

Keep adding tests if they're asserting useful things about the results, even if those code paths are already covered.

Stop adding tests when you're only trying to make the coverage number increase.

I had a recent project where:

- 0-80% was adding useful tests.

- 80-98% was mostly useless.

- 98-100% found some really interesting bugs and forced refactoring that made testing easier, and was definitely worth it.

Not to take away from your point, just to remind people:

100% coverage still does not mean 100% of all cases tested - it just means that every line has been run with SOME data and every conditional statement was run once in both directions.

Seems like a very specific concern in a very specific line of business. The vast majority of engineers write CRUD apps for business clients that mostly care about whether it completely explodes or not.

It also depends on the service. Some services are critical infrastructure and should have their edge cases tested. Some services provide non essential functions and can get away with less.

I agree that there is no magic number, but 100% coverage is clearly overkill for the average engineering team. In fact, broadly demanding any code coverage percentage is probably over simplifying the issue. It just shouldn't be 0 :)

Really reliable software does full MC/DC testing, but even that doesn't catch all mistakes. SQLite for example does this but there are still bugs discovered occasionally. It also leads to insane amounts of test code compared to the actual application code. That is much too expensive to maintain for most projects.

I'd never thought about this before, but you can definitely flip this around to a pathological case of doing the opposite of striving for 100% coverage.

The problem with 100% coverage that I think of is that the last 30% is often boilerplate code (generated getters/setters in Java and that kind of thing).

But what if an engineer started with the boilerplate and only then progressed to tests that are actually important? You might get to 30% without testing anything useful at all. Then you might test half of the important stuff and hit an arbitrary metric of 70%.

If someone were ruled by the sonarqube score, they might even do this.

Yeah, thinking about the why matters more than the %.

So many problems with "100% coverage". Even if you measure it as branch coverage, it only indicates how much of the code that is already written has been covered. It doesn't tell you how much code that is missing, eg guards against malicious input, null-checks or other out of range arguments. This reason alone should be obvious enough for anyone to understand that the metric is completely bogus.

As others mentioned it also doesn't say anything about state space coverage or input argument space coverage or all combinatorial paths through your code. Only that some line/branch has been hit at least once.

100% code coverage is a breadth-first metric, and generally sacrifices in depth-first testing (especially of the "core loop" as in 80% of the time is spent in 20% of the code is sacrificed).

The decidability/halting problem hints that perfect testing is an impossibility for systems of any complexity, as in not just too big of a big-O... it's flat out not possible on a turing machine of any power.

That doesn't mean give up all testing, but there is a LOT of dogma in testing philosophy.

I agree with others - mission critical systems (nuclear plants, aviation, medical software etc) have quite different standard for quality.

Excellent comment, thanks for sharing. One ought to really think about why tests are needed and what value they add, and not just simply follow some random (thoughtful) article on the internet.

I'm a huge fan of tests that ensure the app doesn't break in a significant way in production. How many tests that is depends on the situation. For your case I would agree that 100% test coverage is correct.

It might be obvious to some, but for reference the title (and associated tweet) is a reference to Michael Pollan's oft-quoted advice for eating healthy: "Eat food. Not too much. Mostly plants."


Which, just in case it isn’t obvious, is a timeless classic. Even if you disagree with the advice itself, the argument he makes and the brief survey of the history of nutritional science make it a worthwhile read.

As this article shows, the mode of his argument has also been influential: it is a perfect case study of skepticism without cynicism; or considering competing arguments while avoiding the trap of “bothsiderism”. Matt Levine, another writer held in high regard in this community, is very similar in this regard.

There is also a book expanding on the article.

This is the first time I'm reading (and "hearing") the phrase "bothsiderism" :) I love it! Definitely bothsidering to use it from now on!

bothsiderism is essentially the outcome of what is in my mind the most infamous vendor lock-in move of all: the FCC's equal time rule. By requiring broadcasters give equal time to both sides, the ruling American parties (Democrat and Republican) effectively granted themselves a duopolony on mass media in the US and excluded all competing views. This is also the kernel of truth at the origin of the highly manipulable term "mainstream media".

I'm just reading the wiki, but it sounds like the equal time rule applies to any candidate who wants to buy air time, not just a major party one. Am I misreading it?


What you describe doesn't sound like "bothsiderism", just a two-party system. I don't see how the equal-time rule would be connected to bothsiderism.

As I understand it "bothsiderism" is the practice of not taking a stance for either side of an argument and saying "both sides have valid arguments"/"both sides are equaly wrong", and in the process ignoring that one side is much more right/wrong than the other.

> I think there's blame on both sides, you look at, you look at both sides, I think there's blame on both side, and I have no doubt about it, and you don't have any doubt about it either.

- Donald Trump after violence at the Charlottesville rally

Does this have a name? Like the overuse of "X considered harmful" after Dijkstra, or the annoying increase of "X Y and where/how to Z them" after JK Rowling?

I don't think it's on the same level with "considered harmful" quite yet but definitely heading there. BTW, two more for your list:

  - Everything you always wanted to know about X (but were afraid to ask)
  - X, or how I learned to stop worrying and love the Y

Snowclone is a popular term for it:


Was hoping the article would reference it, clearly a nod to Pollan!

With intermittent fasting concepts, now it is more like "Eat food. Not too frequently. Mostly plants."

I think one mental model for tests is that they're simply another kind of automation.

The fundamental question of automation is: will I repeat this task enough such that the amortized saving outweighs the cost of writing a script to do it?

Whenever you're thinking of adding a test X, quickly consider how often you and your team are likely to need to manually test X if you don't. Also factor in the cost of writing test X (though that's tricky because sometimes you need to build test architecture Y, which lowers the costs of writing many tests...).

If it's a piece of code that's buggy, changing frequently, brittle, or important, then you're likely to need to validate its behavior again and again. It's probably worth writing a unit test for it.

If it's an end user experience that involves a lot of services maintained by different teams and tends to fall apart often, it's probably worth writing an integration test instead of having to keep manually testing every time the app goes down.

If it's an API with lots of important users using it for all sorts of weird edge cases and you don't want to have to manually repro each of the existing weird edge cases any time you add a new one, it's probably worth writing some black box tests for it.

But if it's code that's simple, boring, stable, short-lived, or unimportant, your time may be better spent elsewhere.

Another model is that your tests represent the codified understanding of what the system is. This can be very helpful if you have a lot of team churn. How do you make sure a new person doesn't break something when "something" isn't well-defined somewhere? Tests are a great way to pin that down. If this is a priority, then it becomes more important to write tests for things where the test actually very rarely fails. It's more executable system specification than it is automation.

I agree with the idea that we're mostly automating manual work.

The really nice thing about automated tests is that they can also go much further than I would ever bother to manually. For example, I might manually check that a couple of product pages are showing, with an image, price and description. With automated testing I can check all product pages, every time. It can also test a tedious amount of interactions, e.g. various interleavings of "add to cart", "remove from cart", "checkout", "press back button", etc.

Manual testing still gives me confidence that automated tests can't, e.g. that an "unknown unknown" hasn't taken things down. Yet that's all I need to check manually; if pages are loading and look right, I don't feel a need to manually check all of the ecommerce arithmetic, permission logic, etc.

I so agree with this. I've been fighting this exact battle at work lately. People on my team have decided to take testing seriously, which is fantastic, but many team members' understanding of what that means is still at the "watch unit-test coverage numbers go up" stage. So let me be very clear where I stand.

* 100% unit-test coverage is a garbage goal. *

I don't hate unit tests. They can have enormous value shaking out edge cases in a self contained piece of code - usually a "leaf" in the module dependency graph - and making it future proof. Love it. However, unit tests don't tell you anything about whether the module's behavior in combination with others leads to a correct result. It's possible to chain a bunch of unit tests each with 100% coverage together, and still have the combination fail spectacularly. In operations, a lot of bugs live in the interstices between modules.

Even worse, 100% line coverage means a lot less than people seem to think. Let's say one line to compute an expression will compute the wrong value for some inputs. You might have a zillion unit tests that cause that line to be counted as covered, but still be missing the test that reveals the error case. I see this particularly with ASSERT or CHECK types of macros, which get counted toward 100% coverage even though no test ever exercises the failure case.

Striving too hard for 100% unit test coverage often means a lot of work - many of our unit tests have 3x more tricky mock code than actual test code - to get a thoroughly artificial and thus useless result. The cost:benefit ratio is abominable. By all means write thorough unit tests for those modules that have a good ratio, but in general I agree with the admonition to focus more effort on functional/integration tests. In my 30+ years of professional program, that has always been a way to find more bugs with less effort.

"You might have a zillion unit tests that cause that line to be counted as covered, but still be missing the test that reveals the error case."

Ironically it's actually easiest to have 100% coverage in worse code, because the more entangled and coupled your code is, the more likely you are to hit branches that are not under test.

Great observation. I'm going to use that. :)

Specifically this

> Even worse, 100% line coverage means a lot less than people seem to think. Let's say one line to compute an expression will compute the wrong value for some inputs. You might have a zillion unit tests that cause that line to be counted as covered, but still be missing the test that reveals the error case. I see this particularly with ASSERT or CHECK types of macros, which get counted toward 100% coverage even though no test ever exercises the failure case.

I'd rather have 30% test coverage where the lines under test are the complicated ones (not things like factories) with the testing of those lines hitting all the complex edge cases than 100% test coverage that confirms that all your source code is encoded in UTF-8.

Coverage driven testing is (almost) always (mostly) evil. Coverage as a metric is great. Maybe you're not going for 100%, however it does tell you something about your test set(s), providing a decent measure of "doneness". However all of this is depends on good test cases. Test cases that test something. That might sound obvious, but it's fairly easy to achieve 100% coverage while testing nothing. By the way, the reason I said "almost" and "mostly" before, is you can find bugs while attempting to improve coverage; provided you take a step back forget about achieving coverage, and instead write good tests, that happen to get you the coverage. There's a lot of temptation there and it's not an overall good strategy, but you'll find stuff.

If you think 100% code coverage with unit tests is bad you should see what happens when it's done with integration tests. I'm refactoring some tests now that were written with code coverage in mind, apparently with bonuses tied to the coverage stat. I think I've seen every testing anti-pattern possible in just this one group of ~25 tests.

There's the developers not understanding the difference between unit and integration tests. Both are fine, but integration tests aren't a good tool to hit corner cases.

Many of the tests don't actually test what they pretend to. A few weeks ago I broke production code that had a test specifically for the case that I broke, but the test didn't catch it because the input was wrong, but the coverage was there.

Most of the tests give no indication of what they're actually testing, or a misleading indication, you have to divine it yourself based on the input data, but most of that is copy/pasted, so much of it isn't actually relevant to the tests (I suspect it was included in the overall test coverage metric).

The results of the code defined the test. They literally ran the code, copied the file output to the "expected" directory and use that in future comparisons. If the files don't match it will open a diff viewer, but a lot of things like order aren't deterministic so the diff gives you no indication of where things went wrong.

Many tests succeed, but for the wrong reason, they check failure cases but don't actually check that the test failed for the right reason.

Some tests "helpers" are actually replicating production code, and the tests are mostly verifying that the helpers work.

Finally, due to some recent changes the tests don't even test production code paths. We can't delete them because it will reduce code coverage, but porting them to actually test new code will take time they aren't sure the want to invest.

/end rant

Wow. Yeah, that sounds awful. You're absolutely right that pursuing 100% coverage in integration tests is bad too, perhaps even worse. I just haven't seen that in my own direct experience. Having too few requirements around integration tests seems like a far more common problem than having too many.

Before you battle too hard, let me introduce you to another way of thinking. It may not be to your liking, but I hope you'll find it interesting nonetheless. I'll try to keep it as short as I can. Usually when I type this same post, it takes me a while, but I'm getting better at it.

Imagine that the word "test" is a misnomer, when talking about unit tests. Often people think about testing as a way of checking whether or not the code works properly. This is great for what is known as "acceptance testing". However, as you no doubt agree, it's not so great with "unit testing".

For some reason, people hang on hard to the words "unit" and "test" and come to the conclusion that you should take a piece of code (usually a class), isolate it and show that the class does was it is supposed to. This is a completely reasonable supposition, however in practice it doesn't work that well (I will skip the discussion, because I think you're already in agreement with me on that front).

Instead, imagine that "unit" refers to any piece of code (at any level) that has an interface. Next imagine that "test" means that we will simply document what it does. We don't necessarily worry ourselves about whether it is correct or not (though we wish it to be correct). We just write code that asserts, "When I do X, the result is Y".

At the macro level, we still need to see if the code works. We do this either with automated acceptance tests, or manual testing. Both are fine. When the code works to our level of satisfaction, you can imagine that the "unit tests" (that are only documenting what the code at various levels is doing) are also correct. It is possible that there is some incorrect code that isn't used (which we should delete), or that there are some software errors that cancel each other out (which will be rare). However, once the code is working on a macro scale, in general, it is also working on a micro scale.

Let's say we change the code now. The acceptance tests may fail, but some of the "unit tests" will almost certainly fail (assuming we have full "unit test" coverage). If they don't there is a problem because "unit tests" are describing what the code is doing (the behaviour) and if we change the behaviour, the tests should fail.

For some types of unit testing styles (heavily mocked), often the unit tests will not fail when we change the behaviour. This means the tests, as a long lasting artefact are not particularly useful. It might have been useful for helping you write the code initially, but if the test doesn't fail when you change the behaviour, it's lost utility. Let's make a rule: if the test doesn't fail when the behaviour fail, it's a "bad" test. We need to remove it or replace it with a test that does fail.

The other problem you often run into is that when you change one line of code, 200 tests fail. This means that you spend more time fixing the tests than you gained from being informed that the test failed. Most of the time you know you are changing the behaviour, and so you want to have very little overhead in updating the tests. Let's make another rule: Unit tests must be specific. When you change specific behaviour only a few (on the order of 1) tests should fail.

This last one is really tricky because it means that you have to think hard about the way you write your code. Let's say you have a large function with many branch points in it. If you give it some input, then there are many possible outputs. You write a lot of unit tests. If you then change how one of the branch points are handled, a whole class of tests will fail. This is bad for our rule.

The result of this is that you need to refactor that code so that your functions have a minimum number of branch points (ideally 0 or 1). Additionally, if you split apart that function so that it is now several function, you have to make each of the functions available to your test suite. This exposes rather than hides these interfaces.

The end result is that you decouple the operation of your code. When you hear about TDD being "Test Driven Design", this is what it means. This is especially true for things like global variables (or near global instance variables in large classes). You can't get away with it because if your functions depend on a lot of global state, you end up having tests that depend on that (near) global state. When you change the operation surrounding that state, a whole whack of tests fail.

Committing to writing high coverage unit tests which also have high specificity forces you to write decoupled code that doesn't rely on hidden state. And because it doesn't depend on hidden state, you have to be able to explicitly set up the state in your tests, which force you to write code where the dependencies on the objects are clear and uncomplicated.

You mentioned code coverage. I'm going to say that I almost check code coverage when I'm doing TDD. That's because if you are writing tests that cover all the behaviour, you will have 100% code coverage and 100% branch coverage. However, as you correctly point out, the opposite is not the case. The test for the coverage of your code is not a code coverage tool, it's changing the behaviour of the code and noting that the tests fail.

Most people are familiar with the idea of "Test First" and often equate that with "Test Driven". "Test First" is a great way to learn "Test Driven", but it is not the only way to go. When you have full test coverage, you can easily modify the code and observe how the tests fail. The tests and the production code are two sides of the same coin. When you change one, you must change the other. It's like double entry accounting. By modifying the production code and seeing how the tests fail, you can information on what this code is related to. You no longer need to keep it in you head!

When I have a well tested piece of code and somebody asks me , "How hard is to to do X", I just sketch up some code that grossly does X and take a look to see where the tests fail. This tells me roughly what I'll need to do to accomplish X.

I see I've failed (once again) to keep this post small. Let me leave you with just one more idea. You will recall that earlier I mentioned that in order to have "full coverage" of unit tests with specificity, you need to factor your code into very small pieces and also expose all of the interfaces. You then have a series of "tests" that show you the input for those functions with the corresponding outputs. The inputs represent the initial state of the program and the outputs represents the resultant state. It's a bit like being in the middle of a debugging session and saving that state. When you run the tests, it's like bringing that debugging session back to life. The expectations are simply watch points in the debugger.

When I'm debugging a program with a good suite of unit tests, I never use a debugger. It is dramatically faster to set up the scenario in the tests and see what happens. Often I don't have to do that. I often already have tests that show me the scenario I'm interested in. For example, "Is it possible for this function to return null -- no. OK, my problem isn't here".

Richard Stallman once said that the secret to fixing bugs quickly is to only debug the code that is broken. "Unit tests" allow you to reason about your code. If you have so called unit tests that are unreadable, then you are giving up at least 50% of the value of the test. When I have problems, I spend more time looking at the tests than the production code -- because it helps me reason about the production code more easily.

I will leave you with one (probably not so small caveat). Good "unit testing" and "good TDD" is not for everyone. I talked about ruthlessly decoupling code, simplifying functions to contain single branch points, exposing state, exposing interfaces (privacy is a code smell). There are people for which this is terrible. They like highly coupled code (because it often comes with high cohesion). They like code that depends on global state (because explicitly handling state means having to think hard about how you pass data around). They like large functions with lots of branch points (because it's easier to understand the code as a whole when you have the context together -- i.e. cohesion). Good unit tests and TDD work against that. If you want to write code like the above, I don't think unit tests will work for you.

I personally like this style of programming and I think it is dramatically more productive that many other styles of programming. Not everybody is going to agree. I hope it gives you some insight as to why some people find unit testing and TDD to be very productive, though.

Great response. :) I don't actually see anything there to disagree with. Maybe we have a difference of ... perspective? style? ... on some points, but no actual disagreement. I rather think the two rants are quite complementary. Thanks!

My pleasure. Actually, someone a few seconds ago posted "Keep trying" with respect to my trying to keep it small and then I guess thought better of it and deleted the post. I'm a little sad about that because I really think it was completely on the ball. This stuff is so subtle and it's super hard say something that has any meaning without burning though a ton of trees. But the downside is that it requires a lot of effort to follow the discussion. I think there is a simple message in there somewhere, but I've yet to find a way to express it.

But I agree completely on the issue of style. I think it really comes down to that. Conflicting styles is one of the hardest things to combat on a team and you often end up with some bizarre hybrid that doesn't work at all.

That person might have been banned, deservedly so. If you want to enjoy the internet you either lurk, or you learn to ignore the haters.

FWIW I liked to read you post. Especially your point about balancing cohesion vs unit testing, as this is something that the TDD evangelists never bother to mention.

This is exactly what I've been doing for a while. I check whether my ideas are SOLID and this have been enough so far. When I change something it only affects its immediate surroundings so things are easy to fix. If something breaks it doesn't cascade throughout the system. I also like to have interfaces for all non-trivial things so I can stub them when I test and provide multiple implementations for different use cases (strategy). With this I feel that I'm insanely productive.

> The result of this is that you need to refactor that code so that your functions have a minimum number of branch points (ideally 0 or 1). Additionally, if you split apart that function so that it is now several function, you have to make each of the functions available to your test suite. This exposes rather than hides these interfaces.

> The end result is that you decouple the operation of your code. When you hear about TDD being "Test Driven Design", this is what it means.

So I think this is what creates Java-itis, promotes over-abstraction and shifts the locus of the semantics of the program away from code flow and towards dynamic runtime data shape, which may be driven via verbose construction, dependency injection, configuration data, or potentially an arbitrarily complex program that writes the actual program. I think it makes programs harder to understand because instead of being locally understood, the dynamic composition must be mentally evaluated to understand what's going on.

It's what Java programmers do to create a domain-specific language in the style of Lisp or Smalltalk, the kind of DSLs that enabled tightly knit teams to be pretty productive but create effectively custom languages that are all but incomprehensible to people coming into the project. There are strong reasons why most development isn't in Lisp or Smalltalk.

I believe abstractions should hide details. If your abstractions have so little details so that they only branch zero or once, the branches will be elsewhere; they'll be in the composition, where they are less visible and no longer comprehensible by merely understanding the language, instead one needs to understand the system.

> I talked about ruthlessly decoupling code, simplifying functions to contain single branch points, exposing state, exposing interfaces (privacy is a code smell). There are people for which this is terrible. They like highly coupled code

Exposed state and exposed interfaces are what create highly coupled code. I think you misunderstand what other people mean by the word "coupling". Coupling means local changes have non-local effects. Visibility and hiding is absolutely crucial to reducing coupling because things that can't be observed cannot have non-local effects.

Let's talk about coupling, from less controversial to what would appear to be more controversial in your perspective.

Inheritance has high coupling. Change the base class and chances are you need to modify all the descendants. Base classes are very hard to write such that they will work correctly when any virtual method may have their implementation replaced by a descendant. If the base class and the descendants are maintained by different teams, the ability to refactor the base class is highly limited; not only do they need to worry about the external API, but also the internal API, the implicit contract in which methods the implementation calls and when.

Large interfaces have high coupling. The fatter the interface, the more permutations of conversation that can occur over it. That makes the chances of a change breaking something higher.

Public data structures have high coupling - other code grows to take dependencies on the data structures and you can no longer freely modify the data structures without breaking the client code. Publicly mutable data structures are even worse: code cannot preserve local (to the code) invariants because other code elsewhere may violate those invariants.

> They like code that depends on global state

This is such a ludicrous straw man it borders on libel! Nobody who prefers data hiding will prefer global state. Just listen to it: it sounds like a contradiction by definition!

I understand why you feel your style makes you more productive. I believe it can make you more productive.

Can you understand why I don't think that style makes a team or a company more productive?

In my opinion this is bang on. Millions of itty-bitty little classes that do so little themselves that they rely on eighteen other itty-bitty dependency-injected "services" are the bane of our industry.

They make it impossible to see how the code is really structured and works, hide state and data flow in the composition (which is where all the bugs then go to lurk), and make it harder to refactor anything without spending days changing all the tests.

For some reason people who are fond of this style seldom use real implementations to test their code, even if those things do no actual I/O, preferring instead to have masses of brittle mock set-up, such that all the tests really prove is that your mental model of how the rest of your code actually works is broken, as you watch the system fail in production in ways that seem surprising to you.

You'll then probably think you should do something about these failures and jump to the other extreme, writing brittle end-to-end tests that are very hard to actually diagnose failures in.


Test coverage is useful: it helps you see what code you should remove as it is never hit.

It's a good observation, but I'm still going to disagree. Let's look at the two cases.

(1) It really is dead code. OK, great, but I've seen people spend a whole day writing hundreds of lines trying to exercise it before they conclude it's truly dead. Is it worth it? If a small volume of dead code is worth expunging at all, I suggest that there are more efficient ways to solve that particular problem.

(2) It should be dead, but it's "revived" by constructing an artificial situation in which it does get called even though it never could in real life. Again, I've seen people waste days on this exercise. Now you're carrying around the dead code and the tests/mocks that make it undead.

So in what situation is there a net benefit? In my experience, any dead code that's found and removed that way is only so at great expense, by people who only found it because they were pursuing the arbitrary 100% goal. I don't think that makes the case that 100% unit test coverage is a goal worth pursuing.

In both your cases you have a situation where someone is creating tests based on what they think will increase coverage the most. I don't think that's necessarily what the parent is saying though. I think what they are saying, is that you can write tests based on the expected/documented behaviour of the module, and if the coverage ends up less than 100%, it's because your module has code paths which are not required according to the expected behaviour. The key is that adding new tests is not the solution unless you can specifically identify expected behaviours that you missed in the tests. Looking at the code and trying to reverse engineer what tests are necessary to achieve 100% coverage will always lead to the situations you describe.

These sorts of tests aren't efficient though. I suspect the only way to get this outcome from your tests is if you encode all your edge cases into end-to-end integration tests (which would reveal which portions of the code can never be hit)... I find that approach to testing to be too expensive, and prefer an approach where success cases and known to-be-difficult cases (like say, using a strange third party tool with weird error signaling) are encoded in end-to-end tests with edge cases being limited to small units of code.

> if you encode all your edge cases into end-to-end integration tests

I think that's what you should do. Your integration tests should check that all your specifications are validated. And your specifications should cover all edge cases.

So yes that's a lot of "slow" tests. But I think the best would be to work on the tooling to make those tests faster and easier to setup, not limit their quantity.

I don't believe tests (and the resulting coverage reports) are a time efficient way to locate dead code, you will need 100% logically covered code and some of your dead code may be under unit test and end up being included in the category of "covered code". I think the best way to locate dead code is to simplify it or refactor it - and when you approach a well factored code base then it's usually much easier to see which portions are unreferenced.

100% coverage is easy.

    diff old-program.c new-program.c
Done. Every line in the program is covered.

The current version of Unit Testing came out of the Chrysler C3 project, which was written using VisualWorks Smalltalk. (Eventually sparking Extreme Programming and SUnit, which was the ancestor of JUnit in Java land.) Here's the thing about Unit Testing in an environment like that. The best way to code and refactor code would also automatically refactor all of the Unit Tests. In an environment like that, Unit Tests are pretty nimble. There are no long waits for compile times. The entire test suite can run at the press of a button from a widget integrated in your standard development environment. Though, from what I read, the entire Unit Test suite would take an entire 10 minutes to run. However, you could easily just run the tests for the classes you were working on at the time, and reserve the whole suite for checkin time.

So what happens when you move the practice away from this particular kind of Smalltalk environment? Refactorings in most languages are slower without the Refactoring Browser, and often your Unit Tests effectively double the amount of work involved. The velocity of change slows down. Unit Tests might be less nimble to run. A long compile time might be involved. Given those changes, it makes perfect sense that a larger granularity of tests and fewer tests might be more convenient.

it may been born in the smalltalk world, but it really was worked fleshed out and challenged and hashed and rehashed in the world of java ( and mirrored in other languages like C++, C#, and ruby ). The XP forum was very active in building what it meant to unit test and how to do test first development, it was highly motivated by the idea of being able to robustly respond to change with quick feedback loops. What became apparent is that the smalltalk world of small modular highly composable designs is kind of critical and much of the C++ / Java world struggled to achieve that, but there was a lot of advice on design techniques. Over time what's been left is more of an emphasis on test than design when talking about that pros and cons of unit testing. So we now want to validate our software works more than we want techniques to quickly evolve our designs and adapt to change in a robust way. Both achieve the idea of working software. So unit testing as a technique to test your software works may not be as good as integration testing and E2E testing, but that wasn't its sole goal, adaptable design was. Not that design has been left behind, if anything we are seeing quite a lot of focus on things like functional programming and new design ideas for putting software together. The key thing is not really the dogma of practices but the ideas of working software and adapting to change and that the idea of robustness and good design is embraced in the core of what you do.

What became apparent is that the smalltalk world of small modular highly composable designs is kind of critical

All of the cost/benefit tradeoffs of Smalltalk point towards small granularity. You had "everything is an object" to a very high extent. This meant that objects and lambdas had to be super nimble, because literally everything (with just a literal handful of exceptions) was made out of them.

When these pieces get less nimble, the cost/benefit changes, and the granularity at which you operate changes as well.

if anything we are seeing quite a lot of focus on things like functional programming

Some of that is also suitable for small granularity.

Right now, I'm trying to figure out how the above applies to Golang.

I believe that contrary to the conventional wisdom one should write tests from top down. First integration tests, then move down to testing individual functions if necessary. Not the other way around.

On my current project, for the first time in my long career, I have 100% code coverage. How I achieved it? By ignoring best practices on what constitutes a unit test. My "unit" tests talk to the databases, read and write files etc. I'll take 100% coverage over unit test purity any day of the week.

100% coverage doesn't mean much if you aren't testing that the low-level code is doing the right thing.

For example, imagine you're testing a calculator app. Your integration tests make sure that it never crashes, the UI works, basic math is working, etc, but maybe the sin() function is only accurate to two decimal places.

Edit: I do not mean to imply that you, specifically, are missing things. Rather, it is possible to write tests that have 100% code coverage while missing many possible bugs, and I think those risks are increased without the presence of unit tests.

> 100% coverage doesn't mean much

That's absolutely right. 100% coverage doesn't say much. In fact, by itself it's a pretty meaningless measure of code quality. But the top-down idea still holds. In practice bugs are more likely to happen at the seams between components, and therefore it's where one should test first, not last. It of course depends on the type of the project but I believe it's true for the majority of projects out there.

100% means nothing, but 30% means something is wrong. Coverage tells you nothing, but the absence of coverage is interesting and useful.

Repeatedly talking to the database (i.e. "retesting known database functionality over and over and over") is not a problem intrinsically. If you have a smallish codebase it's not a problem, that is. But get into the hundred-thousand-line and beyond point, and they become a very serious time-consumer on test runs, which need to be fast for high team productivity.

Learned this the hard way by working on a million-line codebase for a couple years on a team. 40 minute test suite runtimes before you know if you broke something become a serious flow-breaker.

It is of course up to you if you want to design for scalability in the code sense, however.

Top-down / outside-in test-first development is BDD.

The TDD style bottom-up approach can be handy too. On a recent project I write most of the tests first, to consider the API before implementation. These should come in handy when future developers join the codebase and need to understand the intended behavior of these components.

Would a future developer care if you wrote your tests before or after the API was implemented?

Actually yes, it's really hard to write a test without mocking for an existing (non-pure) API.

Interestingly, if you go right back to Kent Beck's TDD book, you'll see you're in agreement with him. Also with rails testing as it comes out-of-the-box. Testing individual functions is orthogonal to testing a unit.

how big is your project? what's your test suite run time? I honestly think unit test purity is very worthwhile if you have a large project. It means the difference between a 1 build time vs a 6 minute build time.

What is your stack? I feel like you are doing something odd if total build times are a bottleneck on development.

The number of times I've said "oh I bet this works" just to be wrong when I wrote the tests is countless. For me it's more about having small well-defined interfaces with very strong tests

Moreover, tests that don't break are useless

Though brittle tests are also useless, we have some snapshot result tests (take a large set of assumptions, push it through the pipes, get a large result set - in our case it's a webpage, but this holds for other settings) and almost every change to code covered by one of those tests breaks it, then the developer updates the test to work again, was their change logically correct - it can be hard to tell if 500 lines of a 2k expected result changed in minor ways.

So a too-brittle test can be valueless.

Integration tests really are the best bang for the buck. So many times people think they tested their code by mocking something like the database layer and are surprised when the application in production breaks. Everything can be tested in isolation and work perfectly, but often it's how the components work together that determines if the system works at all.

Integration test what you can. Unit test core pieces of logic. Avoid mocks.

Fully agree. The majority of us are writing code with has to talk to a lot of other systems which we didn't write and don't control. These interface areas between systems are huge gathering places for bugs.

Also, mocks are self-deception at best and lies at worst.

In general automated tests are used mostly to save time as the earlier you find the bug the more savings you have when fixing it. When your testers find it, they need to create and fill the ticket, assign it to developer, then developer needs to reproduce, fix it and assign it back and tester needs to retest it. And it's much more expensive if it happens in production! Integration/unit tests can catch a lot of those. There are some diminishing returns so 100% coverage is not needed and integration tests are more effective in catching bugs so I agree with the article's idea. Use unit tests for some real units - algorithms, calculations etc, and don't just test mocking framework. With additional layer of end to end tests and manual testing system should be able to achieve pretty good quality without spending unreasonable amount of time for it.

This is pretty similar to my approach. In the context of developing an API I use integration tests more than unit tests. It is straightforward to spin up an application server against a test database with some test data. Run some example requests against it and verify the results.

I frequently have Test classes which contain a full scenario related to a specific resource or action. For example:

* create resource * lookup resource * modify resource * list all resources * delete resource

The JSON files for the integration tests are then used as examples in the API documentation.

Unit tests are reserved mostly for verifying business logic components. There is no need to setup a complex ControllerTest with mocked out Services or Repositories as you don't care about the internals anyway. Just the input vs output and the integration tests cover those already.

My comment from the original post verbatim:

I agree with the part that you should write tests, but I definitely disagree with the part that most of your tests should be integration tests.

As you pointed out the testing pyramid suggests that you should write more unit tests. Why? Because if you have ever tried TDD you know that unit tests make you write good (or at least acceptable) code. The reason for this is that testing bad code is hard. By writing mostly integration tests you lose one of the advantages of unit testing and you sidestep the bad code checking part.

The other reason is that unit tests are easy to write. If you have interfaces for your units of code then mocking is also easy. I recommend stubbing though, I think that if you have to use mocks it is a code smell.

Also the .gif with the man in pieces is a straw man. Just because you have to write at least 1 integration test to check whether the man has not fallen apart is not a valid reason to write mostly integration tests! You can’t test your codebase reliably with them and they are also very costly to write, run and maintain!

The testing pyramid exists for a reason. It is a product of countless hours of research, testing and head scratching. You should introspect your own methods instead and you might arrive at the conclusion that the codebase you are working on is bad and it is hard to unit test, that’s why you have chosen to write mostly integration tests.

Maybe but do you think 100% coverage is such a bad experience to have at all ? This can be a rich exercise, and a rite of passage to know what tests matters, which should be automated (fuzzers, dbunit etc), which can wait, etc... and how to get to 90% with the effort of doing doing only 30.

Integration tests should write themselves. It depends on the projects, but personally i use to get 65% of coverage for the price of 5%. For example you want to test a bunch of commands, then you can make a function like autotest('some command', 'some_fixture.txt'): call "some command", capture the output, and write some_fixture.txt with the captured output, and fail complaining that it had to create some_fixture.txt. The next run it will find some_fixture.txt and compare the output and fail only if it differs.

Unit tests should of course be hand written, but to cover everything that matters like for bugs, or for what you want to refactor in TDD. Of course any line that's not covered can potentially break when upgrading versions, but this kind of breaks are likely to be revealed by the 90% of coverage that I think you can get with minimal effort by applying this recipe. Then, you can afford 0day updates when underlying libraries upstream make a new release candidate.

100% coverage doesn't prove the code is fully tested, just like having unit/integration/acceptance/smoke and other types of tests doesn't prove that application works. But this doesn't mean we stop adding tests to our suite, so why stopping adding more coverage?

I totally agree with "just stop mocking so much stuff". Most of the time we can use real implementation and by doing this, we'll also increase coverage.

100% unit test code coverage is just Uncle Bob BS and it is infeasible for the most part

And while it will give some warranties about correctness, it will not fundamentally guarantee the application does what is supposed to do.

Integration tests are certainly better

I can only see the headline being a "great wisdom" to people who have accepted Uncle Bob and TDD crap for too long and without questioning it. Because it is obvious.

Why it is obvious? Because that's how things were done most of the time

When you had 640k of RAM and a C compiler writing unit tests was not impossible, but pretty hard. Testing that your app reacted to inputs and acted correctly was doable and easily automatable. And what wasn't automatable would be tested manually.

Now here comes the "holy highnesses" of testing gurus, gaslighting developers and saying that code with no tests (which tests? automated? unit? the definition is purposefully vague) doesn't work or that the only blessed code is the one that is produced through TDD onanism? No thank you

Counterpoint: Integrated Tests are a Scam


I found that argument more convincing, and it also aligns better with my experience.

The point about 100% coverage not being a good goal is pretty solid.

For the benefit of others, it's also worth pointing out that the author isn't talking about "integration tests" but "integrated tests" (as he defines it).

Integration tests = testing individual software modules as a group/combination.

Integrated tests = instead of testing one behavior at a time, multiple behaviors are tested simultaneously. (or in the author's words: "any test whose result (pass or fail) depends on the correctness of the implementation of more than one piece of non-trivial behavior.")

Integration tests are very important.

It is also worth pointing out that the author used to use the term "integration tests" in this post, and only changed to "integrated tests" after writing.

So a claim that these are completely distinct/disjoint ideas is not really supportable.

As far as I can tell, he doesn't explain the name change, but having watched the talk before he changed the name, it looks like "integrated" is a somewhat more general term.

Specifically: if you are "testing individual software modules as a group/combination", how are you going to do that without "depending on the correctness of the implementation of more than one piece of non-trivial behavior."

I see exactly one way: your "integration" tests have to be trivial, really more akin to characterisation tests. "When I hit this external API with this value, do I get this reaction?"

As far as I can tell, he doesn't talk about this in the post, but he does in the talk/video. These are interface/boundary tests. So you test each unit in isolation, and you test the boundaries. If your unit tests are sufficient and your boundary tests are sufficient, plugging things together will work.

So no integrated/integration tests.

Now I am actually not that strict, I let my boundary tests leak a little towards integration/integrated tests, because it is possible to just not hook up the components right. So you want to check that with some smoke tests. However, once they are hooked up, the more complex behaviours Just Work™.

I agree with the writer about the statement that you don't need a 100% test coverage but you still need to check your coverage. I mean you need to run coverage and go through your code and see if there is any gap in your testing. That said I don't agree with the statement "mostly integration" the reason is that he don't take into an account the ROI of a test, he only take into account the outcome. I mean e2e tests are the best to catch bugs but they are harder to achieve harder to debug and harder to maintain. Same go for integration test they are harder to debug, maintain and perform than unit test. Developpers forget that their time is money and that if they spend time on a test just because it make them feel safe that mean that the project will cost more money. The general rule I use is simple. When you test, test behavior not code and do your test at the lowest level possible.

The thing that changed my approach to testing, at least in OO languages, was being shown that use cases/tasks/"things the system can do" should be first class objects in the model. At that point, you have a single layer that exposes everything of real value in the application, giving you a really simple test surface. Underneath that surface you can refactor to your heart's delight, without worrying about having to maintain lots of pointless unit tests for trivial behaviours on each object - all that matters is maintaining the end to end functionality. So yes, I agree integration tests are the key, and you can architect your application to make this easier. Not over-testing your underlying model is just good old fashioned information-hiding. Testing the UI on top I leave as a matter of taste.

>> Write tests. Not too many. Mostly integration.

If a group of reputable programmers got together and published a book about programming which had a single page which only contained this line in a large font, it would be the most useful and valuable programming book ever written.

Definitely not. See my comment on the parent.

Even at 100% code coverage it is possible to have missed many code paths that involve nonlocal control flow. The most common issue here would be exceptions in languages that support it. In most operating systems there are various interrupts that a program can also encounter (e.g., a lovely SEGFAULT in many *nix type operating systems) and other operating system and hardware level issues (like OOM killer in Linux or an NMI or ECC memory error in embedded systems).

So, 100% code coverage may not even be enough for some applications as was discussed in another too level thread (medical device) or other hard to fix (spacecraft) or life-threatening situations.

We have a new guy at work who loves unit tests. Yammers about them constantly.

If he's working on a feature that needs changes to classes A, B and C then his workflow is: make all the changes to class A, along with lots of unit tests. Then do the same for class B. Finally, do the same for class C. And then, after days of work during which the application doesn't even compile, he runs the integration tests and discovers that the application is broken.

Then someone comes over and rewrites his code so it works.

Don't listen to coding advice from people who can't write robust code.

Especially testing advice.

If you want testing advice, start with the pool of people you know already write robust code.

So... that rewrite of his code is pretty quick and easy because the components you're integrating are well covered by tests at that point?

Unless the tests on A, B and C are ineffective I think this comment isn't really about tests - some hired developers just aren't great at writing software but if they all used vim that wouldn't mean the cause of their errors was the fact that they used vim.

The tests are ineffective. That's because, as the article notes, unit tests - tests of single functions, with dependencies mocked - give very little confidence that code actually works. Problems usually appear in the relationships between parts.

And because they're tightly bound to the internal implementation of the code they test the tests often have to be rewritten when the code is rewritten.

Does this mean unit tests are bad? No. They way they're frequently used I believe they are bad, but you're correct that that wasn't the point I was making. I'm really commenting on the way that common wisdom pushes gimmicky methodologies but doesn't talk about the fundamentals of how to make code changes in an effective way.

I agree with you then - there is nothing inherently beneficial from the quantity of unit or integration tests you have, the quality of the tests to accurately target specific edge/error cases is what makes them valuable.

As an aside I find TDD a particularly bad trend for these sorts of bad habits, TDD is a great idea in theory, but you'll often see TDD unit tests come out mimicking the code under test, i.e. "How to test if this function containing `return n + n ^ 2` is correct... well, let's generate some random numbers and then see if the output is equal to that random number plus it's square! That's like full coverage!" Having tests that duplicate the code under test is a pretty easy trap to fall into with micro-tests and it should make you suspicious of any test that covers a very small unit of code.

This is similar to a statistical problem involving over-fitted trend lines, you can construct a mathematical formula that can go through any number of arbitrary (X,Y) points, leading to an average deviance of 0, but this formula will probably break as soon as you take another sample.

> that wouldn't mean the cause of their errors was the fact that they used vim

But neither you would say that using vim is a metric of code quality. You might even discover that using vim or not is pretty much irrelevant.

If what I've seen is any guide, the rewrite is pretty quick because the developer doing it is way more productive than the first one because he knows exactly where he is going and what needs to be done.

> you find yourself spending time testing things that really don’t need to be tested

The thing with 100% code coverage is not that one should write more tests, but that you should refactor everything that isn't business logic (and, thus, nothing that would be part of the functionality of the application to test) into third party libraries, if it doesn't already come from third party libraries, which it probably should.

I think the root of many testing problems is that people write code and continuously test manually until they think it works correctly. Then, they start writing some automated tests.

What I think is much more effective is to write test instead of manual testing while developing. That is, what you would do by hand just implement in a test. And only at the very end, double-check manually.

Isn't that just TDD?

Similar to it, but not as strict. TDD implies that you write tests first, which also leads the designing of your application code. What I'm describing is more general, where testing happens alternatingly with writing domain code, "roughly" at the same time.

As a counter point to this opinion of Kent's, who I respect greatly, I'd like to propose J. B. Rainsberger's article:

"Integrated Tests Are A Scam" with presentation of the same name.[0]

[0] https://blog.thecodewhisperer.com/permalink/integrated-tests...

I’m not so sure I agree. Often isolated TDD can lead to much cleaner code and succinct solutions. There is also a benefit of quicker test execution. I’ve seen a noticeable difference in the quality of software delivered to my clients with this approach. Gary Bernhardt covers these topics in great depth and highly recommend to anyone interested.

I know this is off topic but why is it that people who, presumably, want us to read their words of wisdom put such a pile of crap at the top of the page that I have to hit the page down key five times before I get to the article?

Yeah this is article comes out and says that it's based on gut, not science. So speaking for anecdotal evidence, unit tests that don't break and frequent deploys are way better than finessing one more Gherkin test.

Those sticky banners and elements injected by Medium are ridiculous. They make a good article hardly readable on HD desktop monitor. Sad state of things, I remember how once upon a time it was a great platform to be at.

At work, the unit tests and data are locked, even the webserver.

Every change of data or requirements causes them to fail or fail because of human error.

At home, I do integration testing and it seems to work much cleaner and more complete.

It's confusing with Rails, because integration tests are now system tests, and the kind of integration tests described in this article are named controller and requests tests.

This is good. But russian translation is terrible (were translations some marketing move?). Please ignore translation. Read original only

have to disagree here...

not writing unit tests means architecture change will break your integration tests. hence, you don't have coverage left during an architecture change.

typing information on a unit (class or function) is nice, except it doesn't find all bugs. If it were to find all bugs about a unit, it would be a complete type checker ( which usually comes in a system like https://en.wikipedia.org/wiki/Isabelle_(proof_assistant), etc, which is a pain to use... )

thus the only "efficient way" to cover the gap between type information and a full blown type checking system are unit-tests.

integration test run slower, hence, you are also wasting more time.

integration test are more difficult to understand by other people, especially if they break.

integration test are generally less stable than unit-test because they involve more units.

integration tests usually don't cover all (error) scenerios of the involved particular units. you do that in unit-test to get the branch coverage to 100%.

his issue regarding mocking potentially can be solved differently. change your interfaces and code architecture such that you minimize mocking and make unit-testing easy. i.e. experiment moving the boundary between units.

I would say mostly unit and very few integration, if the app codebase is designed well, anyway

Maybe the author of Socket.io and Zeit.co just happens to work on integration problems?

> Well, when you strive for 100% all the time, you find yourself spending time testing things that really don’t need to be tested

I noticed people skip testing what is hard to test and just test what it's easy...

Not what really matters

Wow honestly maybe rethink your upsells here.. literally had to scroll half way through the page to get to the actual article. And what's the purpose of having a full screen splash image of a pineapple with sunglasses..

Does Medium force unrelated images into headers? I've never written on it but I can't remember the last time I saw a relevant header image.

You don't have to add a header image, but doing it no matter what seems to be the general thinking on Medium, presumably to get people's attention. See freeCodeCamp's guide on writing for them: https://medium.freecodecamp.org/how-to-get-published-in-the-...

"Medium will automatically make the first image in your article its featured image. This means your article’s og-image meta property will be this image. This image will serve as your story’s ambassador everywhere: social media news feeds, Reddit, Google News — even RSS readers."

"Start considering images early in the writing process. And never publish without at least one image. Otherwise your story will be all but invisible in news feeds."

I agree the header images are rarely relevant, and I wish they weren't considered must haves.

80% of the article is unrelated images and self-promotion but some of the content is quite good.

That pineapple isn't even ripe.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact