"consider instead writing higher level tests that each each test more of the code."
I really like the way they keep repeating this as the answer to so many questions. To me it reads as a softly softly approach to weaning people off what appears to be a mania for micro level unit testing, driven by people like Uncle Bob.
> ... micro level unit testing, driven by people like Uncle Bob.
"The structure of your tests should not be a mirror of the structure of your code. The fact that you have a class named X should not automatically imply that you have a test class named XTest."
"The structure of the tests must not reflect the structure of the production code, because that much coupling makes the system fragile and obstructs refactoring. Rather, the structure of the tests must be independently designed so as to minimize the coupling to the production code."
I really wish Uncle Bob would provide code examples with his blog posts and videos. I struggle to imagine what my tests would look like if I followed this. Possibly as my tests are so tightly coupled right now that refactoring is actually not possible in some cases.
Does anyone know of explations of this with a more hands-on approach, or is this simply a collection of ideas that can’t really be shown?
Practical advice? Consider making every type internal. Then open them up to what is strictly required to be public.
Now you are testing the behaviour of your application through its public surface. This reduces the brittleness of your tests because you can change the internal implementation without rewriting your tests. You have higher assurance. It'll also force your hand to enforce invariants and place guard clauses in the right places rather than "everywhere".
If you follow Cockburn's Ports and Adapters approach too, then you can substitute adapters like persistent state, buses, HTTP clients for appropriate in-memory equivalents.
If you're writing a web application, i've had a reasonably happy time with tests at the controller level, or perhaps right below. You want an interface where values go in and out that correspond to fairly user-level ideas, but which are still tractable for testing.
Integration tests are testing multiple modules and the way how they cooperate. These are mora a feature tests. Take a module as your unit, mock externals, give it an input and check the output/outcome.
>Then, clearly, if a small change to the production code causes large changes to the tests, there is a design problem.
>
>Therefore the tests must have their own design. They cannot simply follow along with the structure of the production code.
The jump indicated by the "therefore" confuses me. I essentially couple my test to production behavior, insofar as it is possible. I do this to test program behavior realistically and because I believe tests should, in some sense, form a specification for the program they test. I don't really care about "testing code"; I care about testing behavior.
I don't have the problem of small changes to code necessitating large changes to tests. It makes me wonder what he does that is causing that.
The argument I get against this is that when a higher level test fails, it's harder to locate the lines of code that broke the test unlike when you have lots of unit tests. I don't find this convincing though. Most of the time it's going to be pretty obvious looking at what lines of code have changed since the last working commit. Lots of projects get by without any tests at all as well. I'm not saying to skip testing completely but it comes as a cost and you need to be practical at weighing up how much time to put into it vs how much time you're going to save. Writing unit tests for everything takes time.
There is no silver bullet. Personally, I let a combination of complexity and importance guide my tests.
The more likely it is that a piece of code will break, and the more business damage it will do if it does break, the more tests I wrap around it.
For self-contained algorithms that have a lot of branches or complex cases, I use more unit tests. When the complexity is in the interaction with other code, I write more high-level tests. When the system is simple but critical, I write more smoke tests.
If I’ve got simple code that’s unlikely to break and it doesn’t matter if it does break, I might have no tests at all.
100% In critical areas I would suggest parameterised tests are worth the effort, especially in conjunction with generators. Property-based testing in FP for example, or just test data generators that'll generate a good range.
>The argument I get against this is that when a higher level test fails, it's harder to locate the lines of code that broke the test unlike when you have lots of unit tests. I don't find this convincing though.
This is a valid observation but the problem clears up when you start integrating better debugging tools into your testing infrastructure: having the ability to intercept and inspect API calls made by your app, launch a debugger in in the middle of the test, etc.
It is also minimized by writing the test at the same time (or just before) the code that makes it pass.
I'm not sure I agree with the "instead" in his recommendation. "micro level unit testing" comes with its own benefits, like faster feedback and also design feedback that integration or acceptance style tests won't give you. I like to start out with a high level acceptance tests and then write unit tests for every change to involved classes. You really need a healthy testing pyramid.
The problem is that, in my observation over a number of codebases with many developers: A lot of unit tests end up testing the implementation. When you go to refactor, these tests break which slows you down. To me it seems nearly impossible not to end up with these kinds of net negative tests when you follow a micro approach. I mean the approach outlined in books and blogs where you write a test and then write only a few lines of code. It's all very neat and satisfying but for me it's just a form of fooling myself. I can see this if I honestly measure the net effect of the these kinds of tests in the large. The only thing is that they might sometimes help me locate a problem a few seconds faster, but that's all.
Agree. I am so sick of having to code in extremely brittle codebases that are impossible to refactor because of the sheer volume of UTs that are tightly coupled to very specific implementation details--added to satisfy a minimum code coverage metric. UTs were supposed to save us from the pain of refactoring, but in many cases they just trade one time-sink for another, and the cost equation remains the same (not to refactor, because it's too time consuming).
In this same vein, I propose another question:
How similar does the test look to the production code?
These kinds of super-granular, brittle tests have a way of looking very similar to the code they are testing (e.g. super strict mocking of every dependency almost looks exactly the same as the code under test!). That style of testing is basically writing the code twice, and verifying that the second copy does indeed equal the first. I'd rather just write the damn code twice, and save myself from having to figure out all the minute differences between testing libraries that framework jockies bikeshed over.
> I am so sick of having to code in extremely brittle codebases that are impossible to refactor because of the sheer volume of UTs that are tightly coupled to very specific implementation details
I think this comes from "unit tests" where they have assertions such as
If more methods were written to not call out to disk or some external service (e.g., a database or API) and we had a main method that handled the external communication, then the unit tests would be much less fragile. This would also eliminate the need for mocking since the methods that are tested have no side effects.
For testing dependencies, an environment where a test instance of the database, API, or other external service is needed. Then the code could be tested against an actual implementation rather than a mock of it.
> If more methods were written to not call out to disk or some external service (e.g., a database or API) and we had a main method that handled the external communication, then the unit tests would be much less fragile.
I am not really following. You can write a seperate function that handles database query, then call this function in your code under test, or one of your argument in the code under test is the databse object handler (this allows you to abstract from a specific db handler). But if you don’t mock out the handler with a “dummy handler”, you will be making real databse call. That is not UT.
You will be doing functional testing.
So you can only abstract so much to get your code testable.
> You can write a seperate function that handles database query, then call this function in your code under test
You can, but that still requires that you mock that call in order to test the method that calls your database query method. But, if you make it such that your method takes the result of the database query as a parameter, then you can test your method without having to use a mock. In other words, you can change this (which requires mocking get_db_result to test):
def a_method_to_test(param1, param2):
db_result = get_db_result(param1)
# do something with db_result
to this:
def a_method_to_test(param1, param2, db_result)
# do something with db_result
Now you can test a_method_to_test without having to mock out the call to get_db_result by just passing in whatever you want as the db_result parameter.
Basically, doing this will separate code that manipulates data from code that either writes or reads data. You can unit test methods that manipulate data, but you will need to do functional/integration testing of code that writes or reads data from other sources.
Rather than having 3 different business logic methods that call out to the database, you have each construct a command object and then have a separate service that calls out to the database based on these commands. You can then test the 3 business logic methods by testing that they construct the correct commands based on their parameters.
Yeah, exactly. To put it another way: get your business logic components to return data structures (a list of DB actions or an updated dict), and your core controller to turn those data structures into DB calls. Then your tests simply say: here’s a certain environment, what action do you think should be performed?
If you think this sounds like mocking, you’re right (it is like mocking, it isn’t mocking). It has the bonus of making it easier to inspect the logic & expose it to the user ("hey I’m about to do these things, does that sound ok to you?").
Then maybe just consider doing functional testing. Assertion code in a mock is, by definition, code that tests an implementation. In the example given, suppose you wanted to replace the database with a nosql version. Now you have to throw away or modify your tests even though the behaviour of your system has not changed
> In the example given, suppose you wanted to replace the database with a nosql version. Now you have to throw away or modify your tests even though the behaviour of your system has not changed
Well, not really. Because now I have the DB stuff all contained in a separate function (or handler which can be proxy to the correct DB engine/type of DB), so if with careful design I should be okay.
Here user_manager can abstracts away the exact DB type/engine like Django.
Of course this is ideal, but that was the motive. I do agree on mocking == testing implementation in general, and functional testing is almost always the way to go... since mocking returns dummy data / fixture, might as well return from a real database. The downside is speed :/ (there are various of tricks like setUpClass, run every test or group of tests in docker containers in parallel, but takes much longer than UTs). Ugh trade-offs.
Another way they prevent refactoring is by creating the work for you to implement the same type / number of tests on your new code so your code review doesn't look like it's deleting 100 tests.
I've seen this with out differently on different code bases. Particularly on Rails code bases I've seen some of the symptoms you describe. On more recent Java projects I've seen the opposite where code that needed to change actually just ended up getting swapped out. The "only one reason to change"/open-closed dream come true. Old classes and unit tests just got thrown out. On those acceptance tests were the pain point when behavior was changed on purpose.
I'm still not sure how to get the good outcome every time. Maybe it's just well designed interfaces ️
I recommend https://www.sandimetz.com/99bottles/ to everyone. Also her other book, Practical Object Oriented Design in Ruby.
They both communicate the essential points very clearly.
Also, after reading those books I got a better idea why OOP has a bad reputation, and why people are so bad at writing tests. Somehow these skills are presumed to be easy and already there when in reality they need practice, a lot of it.
You can delete them but the fact that they are irrelevant now even though you didn't change the behaviour of the software means that it was a waste of time writing them in the first place.
Secondly it decreases your confidence in the refactor. In a classic refactor, nothing should break. This is because a good set of tests are a way of documenting and verifying the design/behaviour of the system.
Finally, often in a codebase with micro tests, there will be an expectation that you replace one set of micro tests with another.
>design feedback that integration or acceptance style tests won't give you.
That "feedback" is the pain you feel when you have to maintain one form of tight coupling (ordinary bad code) combined with another (micro-level unit tests).
Higher level tests won't cause you that 'guiding pain' because they aren't as tightly coupled to your implementation.
IMO you don't need to write tens of thousands of lines of unit test code to spot that you're building tight coupling in. It's something you can spot simply by knowing what to look for.
Moreover, if you make a design mistake or introduce tech debt lower down the stack, having a panoply of low level unit tests means that refactoring will break those tests even if you haven't introduced any bugs. IME that causes a dangerous cementing effect on bad code.
I think if you've got a module which is self contained, reusable and tightly scoped, it's worth surrounding with a lower level test. But, if you don't, you're doing more harm than good by surrounding a module with tests.
Rather than being massive at the bottom and then getting thinner as it goes up, there is a continuous band around the top, covering everything, supported by uprights that dive deeper down, just where needed.
That is, you need a layer of high-level tests for everything (for a web application, some mix of browser tests or controller-level integration tests), along with component-level integration tests and isolated unit tests for the bits of the system that are particularly risky (complex, error-prone, critical, new, old, etc).
On the one hand this is true, but as someone who does prefer keeping unit tests relatively small in order to be able to pinpoint issues easier and not have tests break as often without it being clear why, I think the article does strike a nice balance of only recommending it where it really makes sense, rather than simply stimulating you to write as high-level tests as possible.
I don't like it, it reminds me of the time a colleague wrote 2 separate pieces of new functionality and wrote one single end to end test that used both of them, as the "most efficient" way to test it.
Because most of the time, it reduces "test precision".
Which means when such a test detects an issue, there's now a bigger area of the code where the issue could be located.
> Doesn't efficiency outweigh respecting feature boundaries in tests?
In the end, what we're trying to minimize here is feature development time.
This of course depends on the feedback loop duration (build + run tests), which is why test efficiency is important ; but there are other factors.
Feature development depends on the time it takes to locate the error when a test fails ; a test signalling that "something is wrong somewhere" is not very usefull (in some cases, it can be, if it runs extremly fast - because you can generally see the error with "git diff").
However, infinitely precise tests are often undesirable, because they tend to have extremely rigid expectations about the behavior/API of the code under test, discouraging refactoring, and thus slowing down the development process.
Let's keep in mind that "testing = freezing". More precisely, you're freezing what your tests depend upon (by adding friction to future modifications).
So be careful what you freeze: the initial intent of testing is to increase code flexibility. If you can't modify anything without breaking a test, you're probably missing their benefit.
> Because most of the time, it reduces "test precision". Which means when such a test detects an issue, there's now a bigger area of the code where the issue could be located.
Tests are not a debug tool. Tests are here to tell you when you broke something.
When this happen you can get your debugging toolbox out: stack traces, profilers and things like GDB. And then follow the steps your test script did.
All the projects I've most enjoyed working on had clear, explicit test failures, which almost always gave me enough information to hunt down what I'd broken and where. Obviously that's not realistic for some kinds of projects. But even in a legacy C project, if I had to pull out stack traces and profilers and GDB every single time I wanted to understand why a test failed, I would say I'm working in a pretty terrible codebase.
But "when you click on this after that this happen while it shouldn't" is usually enough info to start debugging: you have reproducible steps. Which you can do with a debugger running so you see exactly were and how things break.
And it is a lot less brittle than "well we tried to refactor some minor thing and now everything is red; but we don't know if the software behavior changed or just because our tests were just checking the implementation".
well, if you're determined to write stupid tests you can do it at any level. And that is not enough info to start debugging unless you had the machine in a known and reproducible state to begin with, which is where we get back to writing focused reproducible tests.
The most important thing here seems to be "For every part of my code change, if I break the code, will a test fail?". Or as I'd put it "if you break the code deliberately does the test actually fail?".
Seen quite a few tests in my time that don't capture the functionality they think they are. They pass but wouldn't be able to tell if the underlying functionality they capture is genuinely broken. This is why I guess the standard practice is to go test red before you go test green.
I became a believer in automated testing when I worked with a large ETL process that worked with real estate data. This data was input by real estate agents, varied wildly in quality, and had both image files and structured text data. Releasing this code before automated testing was fearful. We'd push changes to staging and then wait for three (!) days of data, then examine the staging and production databases manually and see if there were material differences.
Needless to say, we didn't like to release this code very often.
When I left, we weren't doing automated testing on the image processing, but we were on the text side of things. It became far easier to do a release because we had a set of regression tests that gave us confidence we weren't breaking anything. If something did pop up, we could add it to the suite.
Human beings systematically undervalue their future selves. This is why we have a hard time saving for retirement, exercising and writing tests. Think of your future self and write tests.
> Human beings systematically undervalue their future selves. This is why we have a hard time saving for retirement, exercising and writing tests. Think of your future self and write tests.
This!
What's also worrisome is that sometimes your incentives are misaligned; e.g. when product development and maintenance are handled by different teams, or features have tight deadlines and screw the debt that will be the future and we must deal with now.
>Can I do this faster if I don’t write a test? The answer is always no.
I think the answer is mostly no but it's dangerous to think that it's always no.
I've been given many stories in the past for which writing a realistic
automated test would have taken days, manual verification took minutes
and the code was fairly isolated and did not get changed very frequently.
Writing a test under those circumstances is actually a pretty poor investment.
I would be very surprised if an automated test literally took days to write when a manual test is just minutes.
Also the time savings still pay off later, as automated tests usually take seconds to run and there’s no training required - once it’s in the test suite and the test suite runs are automated, it will always run and quickly identify a failure - no “oops, we forgot to show Jim the Intern that he had to test that part manually...”
Setting up your test suite and automation is longer for sure, but not days. Even a complex manual process can be automated relatively quickly... the manual process should be fairly scriptable in any OS nowadays, and most platforms have great test frameworks.
Trust me you'll come across examples like this one day if you make a concerted effort to automate a test for every story. It's a rough guess but I'd say I don't automate 1 out of 20 stories - I'm not exactly blithely unaware of the benefits an automated test suite brings.
Most of the examples that require "days" (or weeks) for the test and 5 minutes for the test would involve the building or amending an elaborate mock/stub.
Some examples where this happens include rare interactions with weird hardware, odd legacy APIs that are scheduled to be replaced, race conditions and obscure edge cases with mocked services.
I document all of these special cases and they should clearly remain special but I'm not going to blindly assume that automating the story will have a positive ROI.
Think about something that necessarily involves hardware interaction, or a GUI, or where all the interesting error cases are non-deterministic (concurrency, network error handling). Ok, on the last one you're pretty much hosed anyway. But we're not all writing nice data-in/data-out apps.
Actually, I probably wouldn't count network error handling, because you can probably use something like vaurien - https://github.com/community-libs/vaurien - to deterministically mimic bad network conditions in an integration test in a fairly reasonable amount of time.
Also, integrating that tool would probably have applications beyond a single story so even if the change takes 5 minutes and making the test work with vaurien takes half a day, it's probably still worthwhile.
Assuming that no tool like vaurien exists, though (and there are plenty of scenarios out there where you'd have to build it from scratch), building the test scenario could become prohibitively expensive.
Let me help expand your imagination: what if the code is for sshing to prod and needs a ssh-agent or a password typed in? Would you leave those cress on your CI system? Would Security be happy with that?
Because you're developing automated tools to deploy to prod & you're verifying the correctness of the system end-to-end with a no-op command. The connecting to an external system is not what's under test, and assume I've written unit tests that mock out the sshing, and that even with a staging environment I have to smoke test the new code before switching everyone over to the new version.
Where are those env variables stored on the CI system? In the job config? I would use Vault for this.
A) In such cases I think testing against the real thing often has a greater chance of catching subtle bugs rather than the automated test scenario against the elaborate mock which is highly likely to share many of the same assumptions that caused the bug.
B) Code reviews ought to flag that a piece of code that is not under automated test is being modified and appropriate care should be taken (ideally this alert should be automated but I haven't yet reached that level).
I think it pays to approach these things on a case by case basis, and if a pattern of subtle bugs does appear that's a strong indication that you should change your behavior (I'm a stronger believer in bug retros than I am in any kind of testing dogma).
>How many customer minutes lost = 1 developer minute
Is that a relevant question to ask? If you introduce the presumption that an automated scenario test is more likely to catch a bug than a manual test then I guess it's relevant, but honestly for these types of scenario I think the opposite is true.
I didn't mention it before also but if you have manual testers on hand that changes the calculus too. I'd say it's normal for 3-4 manual tester minutes to be equivalent to about 1 developer minute.
As I mentioned above, I really don't think it pays to be a test fundamentalist.
>That said, I think tests are the most expensive and most brittle way to address this problem.
Actually, my personal belief is that people, humans like you and I, are all awful at writing software. We're even worse at enumerating and writing tests.
Miserably bad. Unforgivably bad.
Slowly refining and testing systematizations of correct software building processes is perhaps the most important thing we can do in the first 2 centuries of "software" as a thing.
Because otherwise, all we'll do is continue to wallow in pride and failure, claiming it can't be helped. All the while using language like "case by case," that I have taken to mean: "I will never do that unless you force me to."
Fortunately, I think the scope of failure and fraud in the software industry has grown so late that folks are starting to take correctness as a requirement and not a nice to have. Another Equifax or two and maybe a nice DAO hack or something and folks are going to start saying, "Maybe it's just too bad we all learn to make bad software," turning to new techniques and practices.
Far-fetched? Maybe. But it is happening with AI...
>that I have taken to mean: "I will never do that unless you force me to."
I actually created my own open source BDD framework and a ton of tooling to help automate stories.
19 out of 20 was a pretty conservative estimate of how much I automate - it's probably more like 39 out of 40. I'm a little obsessive about it because I want to dogfood my work properly so I automate quite a few things where the cost/benefit for a normal programmer would seem a bit low.
I'm very cognizant that the industry as a whole is terrible at testing and I'm hoping my software can one day do a small part to help with that.
Isn't the wave of FP causing some folks to do this? It takes time for people to change their attitudes. Consider that we're still having this conversation about automated testing, which was promoted by the XP book in the early 2000s.
I like having tests but I would bet if you audit you production stack it's very likely that a very large portion of it does not really have tests yet is prob most stable part of your stack like Linux kernel etc.
Of course, just because you haven’t personally written tests for parts of a stack doesn’t mean that you shouldn’t write tests for other parts, especially the parts you’ve created.
Also don’t forget that certain things like Linux are extremely battle-tested. It’s generally unlikely that any piece of software you write will receive that much real-world testing, so a growing set of automated tests will help you cover your own arse.
Perhaps this always applies for manual testing. When it comes to automated testing and something like Android, it can potentially take much longer, especially with the architecture that allows easy testing.
How much business value is this test adding? That is, if this test failed and we ignored it, how much would the business suffer?
Is the code easy to test? That is, does the design have lots of self-contained components with well-described input/outputs & conditions/assumptions? Do the docs clearly communicate that?
Will the test still work if we change the implementation? How much work to update/remove the tests if the behaviour has to change to follow new business requirements?
> Have I just made something public in order to test it?
>
> If yes, consider instead writing higher level tests that each each test more of the code.
This is one that I struggle with in JS with React.js components. If you have a little helper component in a file that isn't exported but used in the same file by a component that is exported, it is sometimes difficult to test that non-exported component. Because of how enzyme shallow rendering works you don't get the full tree so that component, if sufficiently nested, might not ever be touched. This forces me to export the component just to test it.
I run into the same problem with React and it bugs me from a testing standpoint.
Extracting code from a big component to helper functions and extracting those functions from the component can lead to cleaner code, and it makes it much easier to test the behaviour of the helpers directly than having to render the component with enzyme.
A good example of this is moving state changes to pure functions [1] which makes them much easier to test, but you'll need to export those functions to test them.
> Mocking introduces assumptions, which introduce risk.
This boils down something I've had on my mind a lot of late. Though, with a different spin. I write a lot of Go. I prefer testing interfaces while some others what to use mock generators. This quote captures part of my reasoning behind avoiding mocks. I plan to write a detailed post at some point full of examples. I think this quote will work its way in there.
Being able to run tests in a random order is indeed pretty important. It is not only an indicator that the features were written with high quality, but that the tests are high quality as well. Things such as the mocks mutating happen and the developer writing the tests should avoid those kinds of things happening.
Can I run these tests more than once? Will they ever go stale?
Can I run these tests and they'll clean themselves up?
Having to update tests because they didn't take into account the date changing (Happy birthday Joe Test!) or manually cleaning up data is a huge time suck.
* Are these tests relying on API that is likely to change?
* Can I make the API surface area used by all tests even smaller?
* Can I make a library that wraps the existing API of the unit to
get a smaller and/or more stable API for use in the tests only?
These 3 help get tests that withstand refactors.
An example:
An acceptance test for an editing form is relying on the save button having a certain CSS style to find it and click it. This is API that is likely to change. An unrelated change in the looks of a button may break the test.
If we switch to using the text of the button ("Save"), that's better because that is what the user is likely to rely on too when they try to find a sav button. But its still not perfect as the text of the button could change.
The final step is to make a little library function that finds a save button within a certain form. Then we can encode the logic of the test but vary the kinds of text that are considered "save" (or even the method of finding a save button - perhaps a standard CSS class of save buttons in the future!); the test logic itself remains "permanent" since it doesn't rely on any implementation detail anymore.
The example wasn't perfect, but no, it wouldn't be that. What if in the future we have a screen that needs to show more than one save form? Then we'll need to stop using the "save" element id as a mechanism to denote save buttons; to do that, we need to update all the tests that rely on this mechanism.
The small library function would be `getSaveButton(form)` or even `save(form)` - now every form save test no longer encodes the knowledge of how a form's save buttons are made, whether that's by using a certain ID, class, text or anything else.
Now when we get that requirement for a screen with two forms, we'll no longer get mad and try to attack the idea (two save forms on one screen? that's inconsistent with our product, its confusing the users, etc etc) when the real reason is that it creates pain updating our tests. Instead we just update the save function.
The general idea is to encode the meaning of the test and separate the implementation details (clicking the save button might even be too concrete - "saving the form" is probably about the right level of abstraction). A good way to do this is to describe this test in prose and check if its encoding details that may vary - does this sound like something universally true / something that will be true forever, or accidentally true due to current circumstances?
The author does not cover any question related to application security. Things like is this parameter/input value properly sanitized, does this piece is/is not vulnerable to injection attacks, does this piece of code performs authentication/authorization checks? Is RBAC properly implemented for this method?
I agree with some cases, but "is this parameter/input value properly sanitized" is a bit weird. It should only every apply to a) the db framework, b) those N really weird cases that have to break the abstraction and don't use the db framework. If you have to test every input, then the problem is on a completely different level than missing a test.
Kind of, if you have a centralized place to perform input data validation, as it should, then it is just a matter to test that piece of code same if you are using a framework. However, I don't understand why you refer to a db in the first place? Is it because I used the injection attack as an example? if that's the case bare in mind that Injection target other interprets as well not only a db.
But getting back to my original idea, what I want to highlight the need of adding cases to cover application security.
Perhaps this part is not clear / well defined, but roughly, I meant that code coverage is (about) 100% for the lines added/changed, and some "reasonable" subset of possible breaking changes would be picked up by failing tests.
What I had in mind specifically in the answer, was the case of changing "interfaces" between parts of code. For example, the case of changing a function's arguments, or how it uses them, but omitting to change a call site. Checking that the call site just calls the function would not be enough to produce a failing test, especially without type safety. The test would actually need to assert on what the function does, e.g. its return value for a pure function.
Yes, I think trying to test against every single possible breaking change is not valuable.
Meaning: if I break the feature, will I know about it from a test failure?
Basically, you put the customer first. Make sure your features are tested such that they can't fail without a failing test before writing lower-level tests.
This is also the approach advocated by the popular book Growing Object-Oriented Software Guided by Tests.
That's actually, IMHO, a good asset that comes out of a good test and code coverage.
I would be worried if after adding new piece of code or modifying an existing code if there is not a test that tells me that something is broken due my code.
Looks like Michal has been immersed in Haskell for at least the past year. I wonder if he will have something to say about balancing testing and coverage with static typing.
Thinking about this, at the moment, I don't think static typing is a replacement for testing (or vice versa!). Although, apparently, with something like LiquidHaskell, you can get more logic "into" the types and checked by the compiler, but I'm unsure how much you can do when it comes to more complex business logic.
Regarding testing "glue": static typing often gives evidence (but not proof) that code is glued together appropriately [refactoring even small projects without tests in Haskell is a joy: the compiler essentially tells you what to change]. However, it doesn't give evidence that the high level behaviour is what it needs to be. So I think higher level tests are still needed.
I think maybe changing the first question from...
> Am I confident the feature I wrote today works because of the tests and not because I loaded up the application?
to...
> Am I confident the feature I wrote today works because of the tests and type checking, and not because I loaded up the application?
will probably help you to answer the question about how much static types allow you to forgo tests. My instinct is that in most cases, high level tests are still worthwhile.
I'm thinking much the same, though I do think of tests as looking after the dynamic behaviour and types after the static behaviour. Which seems obvious in retrospect, but once you wrap your head around it you can build neat abstractions like lightweight static capabilities: https://github.com/yawaramin/lightweght-static-capabilities
I really like the way they keep repeating this as the answer to so many questions. To me it reads as a softly softly approach to weaning people off what appears to be a mania for micro level unit testing, driven by people like Uncle Bob.