In response to the article: it's true that "you should very rarely have to change tests when you refactor code", however, most of the time, most coders are changing the expected behavior of the code due to changed or added requirements, which is not refactoring. Tests should change when the requirements change, of course. I am not contradicting the article, only clarifying that the quoted statement does not apply outside refactoring. (Refactoring is improving the design or performance of code without changing its required behavior.)
If you test units in isolation, while adhering to SRP which results in many smaller units depending on eachother to do a task, then simply refactoring without changing behavior screws up a considerable portion of your tests.
As for "tests cast away fear", that is definitely true. Whether or not the lack of fear is warranted is something else, and depends heavily on the quality of the unit tests. I've seen plenty of devs confident of their change because it didn't break any unit tests, only to discover that it broke something they forgot to test.
But I have seen how really deep unit tests enable rapid deployment of code to prod since it produces high likelihood of correctness.
However, refactoring and the LoC bloat is also gigantic.
And we still needed integration tests, and forget about unit tests if there is a lot of network boundaries and concurrency. It might help, but it starts to fail quickly. The facades/mocks just assume too much.
If you are in java, for god's sake use Spock/groovy even if your mainline code is java.
Programming with guard rails :)
Whenever I'm vetting in-/output values to a given set of parameters, I do move from white box (unit tests) to grey box testing.
When I'm done with grey/white box tests, I do make sure integration works as expected.
Why all the hustle of moving through 'the onion'? I wanna make sure to detect misbehaving/unexpected logic as quickly as possible. Searching for malfunction detected while running integration tests takes way more time than already catching them at the onion's most inner layer (unit tests).
These days I can't imagine writing any kind of secure software without writing tests, regardless of static typing. Static typing does not increase my confidence in code very much.
Static types get rid of other kinds of bugs and unit tests you don't have to write as a result. It, of course, does not solve everything or remove the need for tests.
Rust's borrowing system help get rid of multithreaded data race bugs, something traditionally hard to write any sort of fast unit tests for.
Other static analysis tools also help you not write tests by testing certain things on everything. Even linting is another kind of automated meta-testing tool.
Also for your json 'string typing' example, that is a hint maybe you should use statically defined models instead of relying on unit tests to catch typo bugs?
Misspelling a key while encoding to JSON: In a system with JSON in it, the JSON part of the system I would consider to be a part with dynamic types. So, that errors can be introduced in a part of the system that is using something resembling dynamic types can be seen as the result of moving away from static types.
Adding two values where you should have subtracted: In general, this can't be detected by static typing, but there are also examples where it can (for example - try adding two pointers in C - this is a type error because only subtraction is supported)
Forgetting to log an error: Personally, I don't see this as practical at the moment, but if the popularity of rust means that linear typing because a bit more "normal" to people, then having errors where it is a type error to not use them is a possibility.
Personally, I'm a believer in using languages with simple static type systems, but pretending they have fancier type systems by the way of comments and asserts. Sure, a compiler can't check the invariant you mentioned in your comment, but with the language tech we have at the moment (and I would guess for the foreseeable future) a typical human can't understand a moderately complicated invariant that a compiler can check.
I can’t see why anyone would want not to do this.
From there on you just use JSON.NET to convert your JSON-string to an instance of the root type, literally one line of code.
And after that you get code-completion and type-inference and compile-time checking for all data-object access.
It’s an absolute no-brainer.
> I'd argue this is more cost effective than tests
It depends on when the types are introduced.
Choosing to write a project in a strong statically typed language might be quite low cost. This often depends on library availability, e.g. pick a language like Python and there might be off-the-shelf libraries to do most of the heavy lifting; pick, say, StandardML and this will probably need to be written in-house.
Trying to add static types to an existing project might be quite difficult, and may give us hardly any confidence in the code. For example, if we're adding optional/gradual types to our code, but that information keeps getting lost when passing data through an untyped third-party library.
I often see tests that are basically checking if the code didn't change. I.e. checking if function calls (on injected (mock) objects) have been done in the exact specific order, instead of checking for some correct end result.
I'm not saying you shouldn't test your sum function using unit tests, but ultimately users don't care so much about the sum function, they care that they have a table, and it should show a sum, and so that functionality should be tested.
There are other people who treat the phrase "public interface" as meaning whichever way that users interact with the system, e.g. commandline arguments, HTTP request parameters, exposed modules of a library, etc.
Sounds like you're using the first terminology and others are using the second.
(I wrote about this at http://chriswarbo.net/blog/2017-11-10-unit_testing_terminolo... )
Yes, some times you have to make an unclean cut into your code to test something. But this is not the default situation. That's why the article has that "mostly integration" part on its title.
I hope many managers and programmers out there don't take this the wrong way. I've been an engineer on a project that was attempting to get 100% code coverage on a piece of software I was writing. I heard constant remarks during this period that were similar to "You don't need 100% code coverage, it doesn't do anything!" These engineers who I was working with had only read articles like this and didn't stop to think about what the article was trying to say. From my experience there is no safe rule of thumb for for how many tests should be implemented for a project. There should be just enough to feel safe (as hathawsh has said). If you're recommending to engineers on your team to stop implementing tests when they say "I'm at 60% coverage and it'll take ~2 days to get to 100%" I'd really hope you take the time to understand why they want 100% coverage.
The software I was working on when my coworkers started telling me to stop writing tests was code designed to trigger alarms when patient's vital signs met certain criteria set by doctors. I am very thankful that I did hit 100% coverage because between 60% and 100% there were many small edge cases that, had they caused a death, I wouldn't have been able to sleep well. Had they said I was explicitly disallowed from working on the tests I would have come in on a weekend (or taking a PTO) and implemented them then. It's our ethical responsibility to know when and where paranoia is worth the marginal time penalty.
Most of us aren't working on life or death code like this. My React app doesn't need 100% code coverage but you put it well when you said "There should be just enough to feel safe"
But you're not saying the right words here. 100% doesn't mean you've covered every edge case (although I suppose 60% necessarily means you _haven't_). I can hit 100% without actually asserting anything.
I think it's harmful to talk about 100% without also considering mutation testing, boundary analysis, manual mutation testing...
The senior engineers were not supportive of other forms of testing in any way (even the user testing). They flat out refused my proposal to attempt to create an integration test suite to cover it's communication to other services as well as other services consuming it's data (this I started over the weekend and was told to proceed no longer).
tl;dr: You're absolutely correct but I still think my testing procedures were better than nothing.
I am sure they were, but you also said:
>take the time to understand why they want 100% coverage
which to me indicates that they are under the false impression that this gives some sort of completeness, i.e. "make sure this plastic fence covers the entire stretch of mountain road".
We have a coverage threshold of 90% and it is helpful because it works as a reminder if you try to check in code that isn't covered by tests. I think that pushing that to 100% would add little value but a lot of overhead. Never tried it though so I fully admit that it's just my hunch.
Exactly. You are in the tiny minority writing literally life-and-death code, and for which a lot of other common advice given regarding general software development likely does not apply either (like "move fast and break things".) Also, for your type of application I would probably want 100% state space coverage too.
Keep adding tests as long as they tell us something new, give us more confidence, document some regression, etc.
Keep adding tests if they're exposing edge cases (e.g. MAX_INT), even if those code paths are already covered.
Keep adding tests if they're asserting useful things about the results, even if those code paths are already covered.
Stop adding tests when you're only trying to make the coverage number increase.
- 0-80% was adding useful tests.
- 80-98% was mostly useless.
- 98-100% found some really interesting bugs and forced refactoring that made testing easier, and was definitely worth it.
100% coverage still does not mean 100% of all cases tested - it just means that every line has been run with SOME data and every conditional statement was run once in both directions.
It also depends on the service. Some services are critical infrastructure and should have their edge cases tested. Some services provide non essential functions and can get away with less.
I agree that there is no magic number, but 100% coverage is clearly overkill for the average engineering team. In fact, broadly demanding any code coverage percentage is probably over simplifying the issue. It just shouldn't be 0 :)
The problem with 100% coverage that I think of is that the last 30% is often boilerplate code (generated getters/setters in Java and that kind of thing).
But what if an engineer started with the boilerplate and only then progressed to tests that are actually important? You might get to 30% without testing anything useful at all. Then you might test half of the important stuff and hit an arbitrary metric of 70%.
If someone were ruled by the sonarqube score, they might even do this.
Yeah, thinking about the why matters more than the %.
As others mentioned it also doesn't say anything about state space coverage or input argument space coverage or all combinatorial paths through your code. Only that some line/branch has been hit at least once.
The decidability/halting problem hints that perfect testing is an impossibility for systems of any complexity, as in not just too big of a big-O... it's flat out not possible on a turing machine of any power.
That doesn't mean give up all testing, but there is a LOT of dogma in testing philosophy.
As this article shows, the mode of his argument has also been influential: it is a perfect case study of skepticism without cynicism; or considering competing arguments while avoiding the trap of “bothsiderism”. Matt Levine, another writer held in high regard in this community, is very similar in this regard.
There is also a book expanding on the article.
As I understand it "bothsiderism" is the practice of not taking a stance for either side of an argument and saying "both sides have valid arguments"/"both sides are equaly wrong", and in the process ignoring that one side is much more right/wrong than the other.
> I think there's blame on both sides, you look at, you look at both sides, I think there's blame on both side, and I have no doubt about it, and you don't have any doubt about it either.
- Donald Trump after violence at the Charlottesville rally
- Everything you always wanted to know about X (but were afraid to ask)
- X, or how I learned to stop worrying and love the Y
The fundamental question of automation is: will I repeat this task enough such that the amortized saving outweighs the cost of writing a script to do it?
Whenever you're thinking of adding a test X, quickly consider how often you and your team are likely to need to manually test X if you don't. Also factor in the cost of writing test X (though that's tricky because sometimes you need to build test architecture Y, which lowers the costs of writing many tests...).
If it's a piece of code that's buggy, changing frequently, brittle, or important, then you're likely to need to validate its behavior again and again. It's probably worth writing a unit test for it.
If it's an end user experience that involves a lot of services maintained by different teams and tends to fall apart often, it's probably worth writing an integration test instead of having to keep manually testing every time the app goes down.
If it's an API with lots of important users using it for all sorts of weird edge cases and you don't want to have to manually repro each of the existing weird edge cases any time you add a new one, it's probably worth writing some black box tests for it.
But if it's code that's simple, boring, stable, short-lived, or unimportant, your time may be better spent elsewhere.
Another model is that your tests represent the codified understanding of what the system is. This can be very helpful if you have a lot of team churn. How do you make sure a new person doesn't break something when "something" isn't well-defined somewhere? Tests are a great way to pin that down. If this is a priority, then it becomes more important to write tests for things where the test actually very rarely fails. It's more executable system specification than it is automation.
The really nice thing about automated tests is that they can also go much further than I would ever bother to manually. For example, I might manually check that a couple of product pages are showing, with an image, price and description. With automated testing I can check all product pages, every time. It can also test a tedious amount of interactions, e.g. various interleavings of "add to cart", "remove from cart", "checkout", "press back button", etc.
Manual testing still gives me confidence that automated tests can't, e.g. that an "unknown unknown" hasn't taken things down. Yet that's all I need to check manually; if pages are loading and look right, I don't feel a need to manually check all of the ecommerce arithmetic, permission logic, etc.
* 100% unit-test coverage is a garbage goal. *
I don't hate unit tests. They can have enormous value shaking out edge cases in a self contained piece of code - usually a "leaf" in the module dependency graph - and making it future proof. Love it. However, unit tests don't tell you anything about whether the module's behavior in combination with others leads to a correct result. It's possible to chain a bunch of unit tests each with 100% coverage together, and still have the combination fail spectacularly. In operations, a lot of bugs live in the interstices between modules.
Even worse, 100% line coverage means a lot less than people seem to think. Let's say one line to compute an expression will compute the wrong value for some inputs. You might have a zillion unit tests that cause that line to be counted as covered, but still be missing the test that reveals the error case. I see this particularly with ASSERT or CHECK types of macros, which get counted toward 100% coverage even though no test ever exercises the failure case.
Striving too hard for 100% unit test coverage often means a lot of work - many of our unit tests have 3x more tricky mock code than actual test code - to get a thoroughly artificial and thus useless result. The cost:benefit ratio is abominable. By all means write thorough unit tests for those modules that have a good ratio, but in general I agree with the admonition to focus more effort on functional/integration tests. In my 30+ years of professional program, that has always been a way to find more bugs with less effort.
Ironically it's actually easiest to have 100% coverage in worse code, because the more entangled and coupled your code is, the more likely you are to hit branches that are not under test.
> Even worse, 100% line coverage means a lot less than people seem to think. Let's say one line to compute an expression will compute the wrong value for some inputs. You might have a zillion unit tests that cause that line to be counted as covered, but still be missing the test that reveals the error case. I see this particularly with ASSERT or CHECK types of macros, which get counted toward 100% coverage even though no test ever exercises the failure case.
I'd rather have 30% test coverage where the lines under test are the complicated ones (not things like factories) with the testing of those lines hitting all the complex edge cases than 100% test coverage that confirms that all your source code is encoded in UTF-8.
There's the developers not understanding the difference between unit and integration tests. Both are fine, but integration tests aren't a good tool to hit corner cases.
Many of the tests don't actually test what they pretend to. A few weeks ago I broke production code that had a test specifically for the case that I broke, but the test didn't catch it because the input was wrong, but the coverage was there.
Most of the tests give no indication of what they're actually testing, or a misleading indication, you have to divine it yourself based on the input data, but most of that is copy/pasted, so much of it isn't actually relevant to the tests (I suspect it was included in the overall test coverage metric).
The results of the code defined the test. They literally ran the code, copied the file output to the "expected" directory and use that in future comparisons. If the files don't match it will open a diff viewer, but a lot of things like order aren't deterministic so the diff gives you no indication of where things went wrong.
Many tests succeed, but for the wrong reason, they check failure cases but don't actually check that the test failed for the right reason.
Some tests "helpers" are actually replicating production code, and the tests are mostly verifying that the helpers work.
Finally, due to some recent changes the tests don't even test production code paths. We can't delete them because it will reduce code coverage, but porting them to actually test new code will take time they aren't sure the want to invest.
Imagine that the word "test" is a misnomer, when talking about unit tests. Often people think about testing as a way of checking whether or not the code works properly. This is great for what is known as "acceptance testing". However, as you no doubt agree, it's not so great with "unit testing".
For some reason, people hang on hard to the words "unit" and "test" and come to the conclusion that you should take a piece of code (usually a class), isolate it and show that the class does was it is supposed to. This is a completely reasonable supposition, however in practice it doesn't work that well (I will skip the discussion, because I think you're already in agreement with me on that front).
Instead, imagine that "unit" refers to any piece of code (at any level) that has an interface. Next imagine that "test" means that we will simply document what it does. We don't necessarily worry ourselves about whether it is correct or not (though we wish it to be correct). We just write code that asserts, "When I do X, the result is Y".
At the macro level, we still need to see if the code works. We do this either with automated acceptance tests, or manual testing. Both are fine. When the code works to our level of satisfaction, you can imagine that the "unit tests" (that are only documenting what the code at various levels is doing) are also correct. It is possible that there is some incorrect code that isn't used (which we should delete), or that there are some software errors that cancel each other out (which will be rare). However, once the code is working on a macro scale, in general, it is also working on a micro scale.
Let's say we change the code now. The acceptance tests may fail, but some of the "unit tests" will almost certainly fail (assuming we have full "unit test" coverage). If they don't there is a problem because "unit tests" are describing what the code is doing (the behaviour) and if we change the behaviour, the tests should fail.
For some types of unit testing styles (heavily mocked), often the unit tests will not fail when we change the behaviour. This means the tests, as a long lasting artefact are not particularly useful. It might have been useful for helping you write the code initially, but if the test doesn't fail when you change the behaviour, it's lost utility. Let's make a rule: if the test doesn't fail when the behaviour fail, it's a "bad" test. We need to remove it or replace it with a test that does fail.
The other problem you often run into is that when you change one line of code, 200 tests fail. This means that you spend more time fixing the tests than you gained from being informed that the test failed. Most of the time you know you are changing the behaviour, and so you want to have very little overhead in updating the tests. Let's make another rule: Unit tests must be specific. When you change specific behaviour only a few (on the order of 1) tests should fail.
This last one is really tricky because it means that you have to think hard about the way you write your code. Let's say you have a large function with many branch points in it. If you give it some input, then there are many possible outputs. You write a lot of unit tests. If you then change how one of the branch points are handled, a whole class of tests will fail. This is bad for our rule.
The result of this is that you need to refactor that code so that your functions have a minimum number of branch points (ideally 0 or 1). Additionally, if you split apart that function so that it is now several function, you have to make each of the functions available to your test suite. This exposes rather than hides these interfaces.
The end result is that you decouple the operation of your code. When you hear about TDD being "Test Driven Design", this is what it means. This is especially true for things like global variables (or near global instance variables in large classes). You can't get away with it because if your functions depend on a lot of global state, you end up having tests that depend on that (near) global state. When you change the operation surrounding that state, a whole whack of tests fail.
Committing to writing high coverage unit tests which also have high specificity forces you to write decoupled code that doesn't rely on hidden state. And because it doesn't depend on hidden state, you have to be able to explicitly set up the state in your tests, which force you to write code where the dependencies on the objects are clear and uncomplicated.
You mentioned code coverage. I'm going to say that I almost check code coverage when I'm doing TDD. That's because if you are writing tests that cover all the behaviour, you will have 100% code coverage and 100% branch coverage. However, as you correctly point out, the opposite is not the case. The test for the coverage of your code is not a code coverage tool, it's changing the behaviour of the code and noting that the tests fail.
Most people are familiar with the idea of "Test First" and often equate that with "Test Driven". "Test First" is a great way to learn "Test Driven", but it is not the only way to go. When you have full test coverage, you can easily modify the code and observe how the tests fail. The tests and the production code are two sides of the same coin. When you change one, you must change the other. It's like double entry accounting. By modifying the production code and seeing how the tests fail, you can information on what this code is related to. You no longer need to keep it in you head!
When I have a well tested piece of code and somebody asks me , "How hard is to to do X", I just sketch up some code that grossly does X and take a look to see where the tests fail. This tells me roughly what I'll need to do to accomplish X.
I see I've failed (once again) to keep this post small. Let me leave you with just one more idea. You will recall that earlier I mentioned that in order to have "full coverage" of unit tests with specificity, you need to factor your code into very small pieces and also expose all of the interfaces. You then have a series of "tests" that show you the input for those functions with the corresponding outputs. The inputs represent the initial state of the program and the outputs represents the resultant state. It's a bit like being in the middle of a debugging session and saving that state. When you run the tests, it's like bringing that debugging session back to life. The expectations are simply watch points in the debugger.
When I'm debugging a program with a good suite of unit tests, I never use a debugger. It is dramatically faster to set up the scenario in the tests and see what happens. Often I don't have to do that. I often already have tests that show me the scenario I'm interested in. For example, "Is it possible for this function to return null -- no. OK, my problem isn't here".
Richard Stallman once said that the secret to fixing bugs quickly is to only debug the code that is broken. "Unit tests" allow you to reason about your code. If you have so called unit tests that are unreadable, then you are giving up at least 50% of the value of the test. When I have problems, I spend more time looking at the tests than the production code -- because it helps me reason about the production code more easily.
I will leave you with one (probably not so small caveat). Good "unit testing" and "good TDD" is not for everyone. I talked about ruthlessly decoupling code, simplifying functions to contain single branch points, exposing state, exposing interfaces (privacy is a code smell). There are people for which this is terrible. They like highly coupled code (because it often comes with high cohesion). They like code that depends on global state (because explicitly handling state means having to think hard about how you pass data around). They like large functions with lots of branch points (because it's easier to understand the code as a whole when you have the context together -- i.e. cohesion). Good unit tests and TDD work against that. If you want to write code like the above, I don't think unit tests will work for you.
I personally like this style of programming and I think it is dramatically more productive that many other styles of programming. Not everybody is going to agree. I hope it gives you some insight as to why some people find unit testing and TDD to be very productive, though.
But I agree completely on the issue of style. I think it really comes down to that. Conflicting styles is one of the hardest things to combat on a team and you often end up with some bizarre hybrid that doesn't work at all.
FWIW I liked to read you post. Especially your point about balancing cohesion vs unit testing, as this is something that the TDD evangelists never bother to mention.
> The end result is that you decouple the operation of your code. When you hear about TDD being "Test Driven Design", this is what it means.
So I think this is what creates Java-itis, promotes over-abstraction and shifts the locus of the semantics of the program away from code flow and towards dynamic runtime data shape, which may be driven via verbose construction, dependency injection, configuration data, or potentially an arbitrarily complex program that writes the actual program. I think it makes programs harder to understand because instead of being locally understood, the dynamic composition must be mentally evaluated to understand what's going on.
It's what Java programmers do to create a domain-specific language in the style of Lisp or Smalltalk, the kind of DSLs that enabled tightly knit teams to be pretty productive but create effectively custom languages that are all but incomprehensible to people coming into the project. There are strong reasons why most development isn't in Lisp or Smalltalk.
I believe abstractions should hide details. If your abstractions have so little details so that they only branch zero or once, the branches will be elsewhere; they'll be in the composition, where they are less visible and no longer comprehensible by merely understanding the language, instead one needs to understand the system.
> I talked about ruthlessly decoupling code, simplifying functions to contain single branch points, exposing state, exposing interfaces (privacy is a code smell). There are people for which this is terrible. They like highly coupled code
Exposed state and exposed interfaces are what create highly coupled code. I think you misunderstand what other people mean by the word "coupling". Coupling means local changes have non-local effects. Visibility and hiding is absolutely crucial to reducing coupling because things that can't be observed cannot have non-local effects.
Let's talk about coupling, from less controversial to what would appear to be more controversial in your perspective.
Inheritance has high coupling. Change the base class and chances are you need to modify all the descendants. Base classes are very hard to write such that they will work correctly when any virtual method may have their implementation replaced by a descendant. If the base class and the descendants are maintained by different teams, the ability to refactor the base class is highly limited; not only do they need to worry about the external API, but also the internal API, the implicit contract in which methods the implementation calls and when.
Large interfaces have high coupling. The fatter the interface, the more permutations of conversation that can occur over it. That makes the chances of a change breaking something higher.
Public data structures have high coupling - other code grows to take dependencies on the data structures and you can no longer freely modify the data structures without breaking the client code. Publicly mutable data structures are even worse: code cannot preserve local (to the code) invariants because other code elsewhere may violate those invariants.
> They like code that depends on global state
This is such a ludicrous straw man it borders on libel! Nobody who prefers data hiding will prefer global state. Just listen to it: it sounds like a contradiction by definition!
I understand why you feel your style makes you more productive. I believe it can make you more productive.
Can you understand why I don't think that style makes a team or a company more productive?
They make it impossible to see how the code is really structured and works, hide state and data flow in the composition (which is where all the bugs then go to lurk), and make it harder to refactor anything without spending days changing all the tests.
For some reason people who are fond of this style seldom use real implementations to test their code, even if those things do no actual I/O, preferring instead to have masses of brittle mock set-up, such that all the tests really prove is that your mental model of how the rest of your code actually works is broken, as you watch the system fail in production in ways that seem surprising to you.
You'll then probably think you should do something about these failures and jump to the other extreme, writing brittle end-to-end tests that are very hard to actually diagnose failures in.
(1) It really is dead code. OK, great, but I've seen people spend a whole day writing hundreds of lines trying to exercise it before they conclude it's truly dead. Is it worth it? If a small volume of dead code is worth expunging at all, I suggest that there are more efficient ways to solve that particular problem.
(2) It should be dead, but it's "revived" by constructing an artificial situation in which it does get called even though it never could in real life. Again, I've seen people waste days on this exercise. Now you're carrying around the dead code and the tests/mocks that make it undead.
So in what situation is there a net benefit? In my experience, any dead code that's found and removed that way is only so at great expense, by people who only found it because they were pursuing the arbitrary 100% goal. I don't think that makes the case that 100% unit test coverage is a goal worth pursuing.
I think that's what you should do. Your integration tests should check that all your specifications are validated. And your specifications should cover all edge cases.
So yes that's a lot of "slow" tests. But I think the best would be to work on the tooling to make those tests faster and easier to setup, not limit their quantity.
diff old-program.c new-program.c
So what happens when you move the practice away from this particular kind of Smalltalk environment? Refactorings in most languages are slower without the Refactoring Browser, and often your Unit Tests effectively double the amount of work involved. The velocity of change slows down. Unit Tests might be less nimble to run. A long compile time might be involved. Given those changes, it makes perfect sense that a larger granularity of tests and fewer tests might be more convenient.
All of the cost/benefit tradeoffs of Smalltalk point towards small granularity. You had "everything is an object" to a very high extent. This meant that objects and lambdas had to be super nimble, because literally everything (with just a literal handful of exceptions) was made out of them.
When these pieces get less nimble, the cost/benefit changes, and the granularity at which you operate changes as well.
if anything we are seeing quite a lot of focus on things like functional programming
Some of that is also suitable for small granularity.
Right now, I'm trying to figure out how the above applies to Golang.
On my current project, for the first time in my long career, I have 100% code coverage. How I achieved it? By ignoring best practices on what constitutes a unit test. My "unit" tests talk to the databases, read and write files etc. I'll take 100% coverage over unit test purity any day of the week.
For example, imagine you're testing a calculator app. Your integration tests make sure that it never crashes, the UI works, basic math is working, etc, but maybe the sin() function is only accurate to two decimal places.
Edit: I do not mean to imply that you, specifically, are missing things. Rather, it is possible to write tests that have 100% code coverage while missing many possible bugs, and I think those risks are increased without the presence of unit tests.
That's absolutely right. 100% coverage doesn't say much. In fact, by itself it's a pretty meaningless measure of code quality. But the top-down idea still holds. In practice bugs are more likely to happen at the seams between components, and therefore it's where one should test first, not last. It of course depends on the type of the project but I believe it's true for the majority of projects out there.
Learned this the hard way by working on a million-line codebase for a couple years on a team. 40 minute test suite runtimes before you know if you broke something become a serious flow-breaker.
It is of course up to you if you want to design for scalability in the code sense, however.
Moreover, tests that don't break are useless
So a too-brittle test can be valueless.
Integration test what you can. Unit test core pieces of logic. Avoid mocks.
Also, mocks are self-deception at best and lies at worst.
I frequently have Test classes which contain a full scenario related to a specific resource or action. For example:
* create resource
* lookup resource
* modify resource
* list all resources
* delete resource
The JSON files for the integration tests are then used as examples in the API documentation.
Unit tests are reserved mostly for verifying business logic components. There is no need to setup a complex ControllerTest with mocked out Services or Repositories as you don't care about the internals anyway. Just the input vs output and the integration tests cover those already.
I agree with the part that you should write tests, but I definitely disagree with the part that most of your tests should be integration tests.
As you pointed out the testing pyramid suggests that you should write more unit tests. Why? Because if you have ever tried TDD you know that unit tests make you write good (or at least acceptable) code. The reason for this is that testing bad code is hard. By writing mostly integration tests you lose one of the advantages of unit testing and you sidestep the bad code checking part.
The other reason is that unit tests are easy to write. If you have interfaces for your units of code then mocking is also easy. I recommend stubbing though, I think that if you have to use mocks it is a code smell.
Also the .gif with the man in pieces is a straw man. Just because you have to write at least 1 integration test to check whether the man has not fallen apart is not a valid reason to write mostly integration tests! You can’t test your codebase reliably with them and they are also very costly to write, run and maintain!
The testing pyramid exists for a reason. It is a product of countless hours of research, testing and head scratching. You should introspect your own methods instead and you might arrive at the conclusion that the codebase you are working on is bad and it is hard to unit test, that’s why you have chosen to write mostly integration tests.
Integration tests should write themselves. It depends on the projects, but personally i use to get 65% of coverage for the price of 5%. For example you want to test a bunch of commands, then you can make a function like autotest('some command', 'some_fixture.txt'): call "some command", capture the output, and write some_fixture.txt with the captured output, and fail complaining that it had to create some_fixture.txt. The next run it will find some_fixture.txt and compare the output and fail only if it differs.
Unit tests should of course be hand written, but to cover everything that matters like for bugs, or for what you want to refactor in TDD. Of course any line that's not covered can potentially break when upgrading versions, but this kind of breaks are likely to be revealed by the 90% of coverage that I think you can get with minimal effort by applying this recipe. Then, you can afford 0day updates when underlying libraries upstream make a new release candidate.
I totally agree with "just stop mocking so much stuff". Most of the time we can use real implementation and by doing this, we'll also increase coverage.
And while it will give some warranties about correctness, it will not fundamentally guarantee the application does what is supposed to do.
Integration tests are certainly better
I can only see the headline being a "great wisdom" to people who have accepted Uncle Bob and TDD crap for too long and without questioning it. Because it is obvious.
Why it is obvious? Because that's how things were done most of the time
When you had 640k of RAM and a C compiler writing unit tests was not impossible, but pretty hard. Testing that your app reacted to inputs and acted correctly was doable and easily automatable. And what wasn't automatable would be tested manually.
Now here comes the "holy highnesses" of testing gurus, gaslighting developers and saying that code with no tests (which tests? automated? unit? the definition is purposefully vague) doesn't work or that the only blessed code is the one that is produced through TDD onanism? No thank you
I found that argument more convincing, and it also aligns better with my experience.
The point about 100% coverage not being a good goal is pretty solid.
Integration tests = testing individual software modules as a group/combination.
Integrated tests = instead of testing one behavior at a time, multiple behaviors are tested simultaneously. (or in the author's words: "any test whose result (pass or fail) depends on the correctness of the implementation of more than one piece of non-trivial behavior.")
Integration tests are very important.
So a claim that these are completely distinct/disjoint ideas is not really supportable.
As far as I can tell, he doesn't explain the name change, but having watched the talk before he changed the name, it looks like "integrated" is a somewhat more general term.
Specifically: if you are "testing individual software modules as a group/combination", how are you going to do that without "depending on the correctness of the implementation of more than one piece of non-trivial behavior."
I see exactly one way: your "integration" tests have to be trivial, really more akin to characterisation tests. "When I hit this external API with this value, do I get this reaction?"
As far as I can tell, he doesn't talk about this in the post, but he does in the talk/video. These are interface/boundary tests. So you test each unit in isolation, and you test the boundaries. If your unit tests are sufficient and your boundary tests are sufficient, plugging things together will work.
So no integrated/integration tests.
Now I am actually not that strict, I let my boundary tests leak a little towards integration/integrated tests, because it is possible to just not hook up the components right. So you want to check that with some smoke tests. However, once they are hooked up, the more complex behaviours Just Work™.
If a group of reputable programmers got together and published a book about programming which had a single page which only contained this line in a large font, it would be the most useful and valuable programming book ever written.
So, 100% code coverage may not even be enough for some applications as was discussed in another too level thread (medical device) or other hard to fix (spacecraft) or life-threatening situations.
If he's working on a feature that needs changes to classes A, B and C then his workflow is: make all the changes to class A, along with lots of unit tests. Then do the same for class B. Finally, do the same for class C. And then, after days of work during which the application doesn't even compile, he runs the integration tests and discovers that the application is broken.
Then someone comes over and rewrites his code so it works.
Especially testing advice.
If you want testing advice, start with the pool of people you know already write robust code.
Unless the tests on A, B and C are ineffective I think this comment isn't really about tests - some hired developers just aren't great at writing software but if they all used vim that wouldn't mean the cause of their errors was the fact that they used vim.
And because they're tightly bound to the internal implementation of the code they test the tests often have to be rewritten when the code is rewritten.
Does this mean unit tests are bad? No. They way they're frequently used I believe they are bad, but you're correct that that wasn't the point I was making. I'm really commenting on the way that common wisdom pushes gimmicky methodologies but doesn't talk about the fundamentals of how to make code changes in an effective way.
As an aside I find TDD a particularly bad trend for these sorts of bad habits, TDD is a great idea in theory, but you'll often see TDD unit tests come out mimicking the code under test, i.e. "How to test if this function containing `return n + n ^ 2` is correct... well, let's generate some random numbers and then see if the output is equal to that random number plus it's square! That's like full coverage!" Having tests that duplicate the code under test is a pretty easy trap to fall into with micro-tests and it should make you suspicious of any test that covers a very small unit of code.
This is similar to a statistical problem involving over-fitted trend lines, you can construct a mathematical formula that can go through any number of arbitrary (X,Y) points, leading to an average deviance of 0, but this formula will probably break as soon as you take another sample.
But neither you would say that using vim is a metric of code quality. You might even discover that using vim or not is pretty much irrelevant.
The thing with 100% code coverage is not that one should write more tests, but that you should refactor everything that isn't business logic (and, thus, nothing that would be part of the functionality of the application to test) into third party libraries, if it doesn't already come from third party libraries, which it probably should.
What I think is much more effective is to write test instead of manual testing while developing. That is, what you would do by hand just implement in a test. And only at the very end, double-check manually.
"Integrated Tests Are A Scam" with presentation of the same name.
Every change of data or requirements causes them to fail or fail because of human error.
At home, I do integration testing and it seems to work much cleaner and more complete.
not writing unit tests means architecture change will break your integration tests. hence, you don't have coverage left during an architecture change.
typing information on a unit (class or function) is nice, except it doesn't find all bugs.
If it were to find all bugs about a unit, it would be a complete type checker ( which usually comes in a system like https://en.wikipedia.org/wiki/Isabelle_(proof_assistant), etc, which is a pain to use... )
thus the only "efficient way" to cover the gap between type information and a full blown type checking system are unit-tests.
integration test run slower, hence, you are also wasting more time.
integration test are more difficult to understand by other people, especially if they break.
integration test are generally less stable than unit-test because they involve more units.
integration tests usually don't cover all (error) scenerios of the involved particular units. you do that in unit-test to get the branch coverage to 100%.
his issue regarding mocking potentially can be solved differently. change your interfaces and code architecture such that you minimize mocking and make unit-testing easy.
i.e. experiment moving the boundary between units.
I noticed people skip testing what is hard to test and just test what it's easy...
Not what really matters
"Medium will automatically make the first image in your article its featured image. This means your article’s og-image meta property will be this image. This image will serve as your story’s ambassador everywhere: social media news feeds, Reddit, Google News — even RSS readers."
"Start considering images early in the writing process. And never publish without at least one image. Otherwise your story will be all but invisible in news feeds."
I agree the header images are rarely relevant, and I wish they weren't considered must haves.