If you start writing unit tests too early in the project, you're effecively locking down units of code which haven't yet proved themseves to be useful to your project.
If you build square wheels for example, you may not realize that they're not designed correctly until you attach them to the car and realize that the car doesn't function well with them.
It makes no sense to preemtively write unit tests for a component which has a very high likelihood of not being in its final desired state; you're just giving yourself more work to refactor the unit tests over and over; or worse, you're afraid of refactoring the tests and so you lie to yourself thinking that square wheels are fine.
Integration tests are by far the most useful tests to have at the beginning and middle stages of the project; integration tests allow you to keep your focus on the real goals of the project and not get stuck on designing the perfect square wheel.
In my mind you only need 1 kind of testing: feature tests. Does the application provide the expected output for a given input and/or configuration. The application does all that it claims to do in a very precise way or it doesn't. Everything else is extraneous, though you may have to test the application in wildly different ways with wildly different means of automation to validate all feature support.
Feature tests are executed by running the application with a given input and configuration and comparing the result against a known expected output. Provided sufficient feature coverage this is enough to test for regression. While the feature tests are running each test runs against a clock so that dramatic swings in performance can be qualified with numbers. When that is not enough, as in the case of accessibility, have people (actual users) perform feature tests.
Other problems like bad design, redundant features, or poor code quality are qualified using code reviews and validation tools, like a linter.
For example, I worked on payroll software at some point. After finding a bug, we'd want to ensure that could never happen again, and would want to add a test for it going forward. So for example, I may have needed to write a test around someone who worked in one city in Ohio, lived in another, previous income above some amount, and with certain tax advantaged benefits. Setting up this test via feature testing is certainly possible, but it's likely the test itself will take a significant amount of wall time to execute. It's way faster to just test the payroll calculation code, which means you can run the tests more often, and with less developer inconvenience.
Have automated feature tests that test most of the "happy path" run-throughs of features that your users do on the front end, and then unit tests that test the minutiae.
My favorite project that I ever worked on was one that was set up this way. We had literally hundreds of feature tests, and thousands of unit tests. When running in a single thread on a local machine, it would take two hours, but when running on Circle CI with parallelization, it would run the entire suite in 6 minutes.
This enabled us to release features to production with a high amount of confidence, any time we wanted.
It was not uncommon for us to release bug fixes to production, while still on the phone with the complaining customer. We earned a lot of customer loyalty points any time we pulled that off. And the best part was that because we had the massive feature test suite running as part of of CI/CD, we were able to do that with the same amount of confidence that we would have compared to having a massive QA team with 50 people testing our complete app before every deployment. It was awesome.
We just released a fairly large system that took this testing approach - despite a large number of tests hitting a database, file system and a blob storage emulator, they still completed in only 5 minutes or so on our CI machine (or 1 minute locally on very beefy laptops).
Why? What about your application changed so that it is slower when testing compared to real world use? If anything it should be dramatically faster because people don't provide microsecond accurate automated responses. If the application naturally executes very quickly I would imagine it would take far long to set up the test scenario than to execute against it.
In my own applications if they take more than two seconds to deliver a response (even for 5mb input) then at the very least I have a critical performance defect. There are not many administrative tasks I can complete from start to finish in that frame of time.
I need to make 8 HTTP requests to properly setup my test from a clean slate, including setting up the deductions, previous earnings, company location and employee home address. If each of those requests takes 50ms, and then I need to make a request to actually execute my test, and verify everything went well, that could easily be 500ms.
And that's just what needs to happen to run one end to end test of the payroll calculation feature. It's absolutely valuable to have a few such tests, but I'm not going to run tests for all the weird one off scenarios (like what happens if someone lives in NJ, and works in Yonkers, and the company has a location in NJ as well?) because they'd take minutes to run. I could run that entire test as a unit test in 2ms for the payroll calculation. That lets me run all of the tests I need very quickly.
I don't know about jeffasinger's company, but in optimal circumstances, a test instance of our app running against a in memory h2 DB takes 3 minutes to start. The app internally already performs heavy lifting in parallel, so it's also not clear that running multiple instances of the app will make it that much faster...
You can build a desktop version of the software with hardware mocked out, which is very useful, but then you're doing something much more akin to an integration test.
An entire unit test suite should complete in well under 2 seconds.
Real world use hits an expensive optimized production cluster. Tests run on my laptop or a cheap testing machine.
Real users don't use every feature every time they perform an action.
A test case at the feature level will likely require several requests if its starting from clean state.
You say request so I guess you are talking about a network service request. When I first got into A/B testing many years ago we found we could use it to simulate traffic through the entirety of the Travelocity web site from homepage to checkout and back. They never used the A/B test system for test automation, but they could have. Back then the checkout pages were rarely tested with valid feature tests because they were at the end of the path. There were plenty of unit tests isolated to small bits of functionality at those pages. We discovered, thanks to session recording with things like Tea Leaf and Clicktale, just a bit later that those unit tests were worthless. The checkout pages failed all the time and cost the company tremendously in lost revenue. Some of those failures would show up in our A/B test experiments though.
My lesson learned from this is that testing things in isolation isn't reliable and provides false confidence. You tend to get back the answer you wanted such that is amounts to a self-fulling prophecy opposed to a criticism.
Rather, I'm arguing against feature tests to the exclusion of unit tests. Both are required in a decently tested systems.
Feature tests provide the assurance that the application works as expected and prevent broken changes going to production, as you've pointed out.
However, they're not a practical replacement for the instant feedback that unit tests provide. Our feature tests currently take 1.5->2.5 hours to run depending on CI system load.
That's... not great. On a given workday I get to try 3-4 builds? I don't know about how others work, but personally I like to get some feedback every like 10-25 lines of code. Unit tests provide that. They take 2 minutes to run. If I had to wait 1.5 hours every 25 lines of code, I would sure be much less productive.
Lets say I somehow convinced management to pay for 20x as many compute resources for CI so that I could get those results in 5-10 minutes rather than 1.5->2.5 hours. And 5-10 is the best case, as that's pretty close to the overhead for our CI system to just clone the code and get the secrets to deploy to a test environment and actually deploy to said test environment.
That's still my flow broken every time I need a test run, because I'm not going to sit and stare at build for 10 minutes, I'm going to answer that email from my product manager or review a team member's PR. If that build comes back with a failure, then I'm context switching back to actually fix it.
The point is that this is a funnel. The unit tests are for instant feedback and to catch obvious mistakes, like "Hey, that value might not be defined, you still need to calculate value B, hope you didn't remove/break the fallback calculation". This ensures that less mistakes incur the full 2.5 hour penalty of a CI build that ultimately fails a test. The feature tests are there for safety. The unit tests are there for productivity. The feature tests are "Does this feature work so we can deploy it". In theory you have at most 1-2 runs of these per feature as unit tests _should_ catch issues before they get there. They're simply too expensive in terms of time to be your first layer of defense, rather they're your final layer.
The hole you get into is, lots of units are changed and a feature quits working and everybody says 'its not me!' and it doesn't get fixed.
Test automation is not a remedy to prop up broken leadership. I would appoint an arbitrary owner of the defect and that person would visit with other people as necessary to remove or dismiss various functionality from blame one by one. Once the appropriate collision of changes is discovered it will become more clear how to address resolution.
I was a tester on a project where the primary functionality was a workflow that took 40 minutes minimum to complete, often much longer (it involved migrating servers from one cloud provider to another), and the dev team had this same philosophy: no writing unit tests, just acceptance/feature tests run against live environments, which seemed okay - it was the start of the project, a new codebase, and it probably worked out well on a previous project.
But it was kind of a disaster on this project. With only feature testing: a dev makes a change, updated code is deployed for test, automated acceptance tests kick off and an hour later the tests fail, then they look through the app logs and finally discover a NoneType error or a KeyError or something (Python's dynamic typing did not help either). Then they fix the code and repeat the whole process. Hours later you've got one change tested and merged. Then it goes out to a pre-prod environment for in-depth feature testing with more permutations of inputs, and we'd find trivially broken code there too.
That's a worst case scenario, but it happened all the time. It just took way too long to run all the feature tests needed to get good coverage of code paths. We'd literally waste full days trying to get something out to production because of the slow dev/test cycle. Eventually, the devs caved and added unit tests because we ran into so much broken code.
I never appreciated unit tests much until after that project, especially for dynamically typed languages. It's the quickest and easiest way to weed out trivial issues in the code - human eyes will not catch everything in peer review and feature tests with a sufficient number of input permutations can take way too long to run.
I dont agree with the "100% test coverage" mentality, but I think there is value with unit and integration tests.
There are very specific reasons for each level of testing and the benefits stack.
You (the programmer) write unit tests too, and even well code with well written unit tests will have bugs in it, if simply due to logic errors, or if due to mocks.
>Please tell me all of the products that your code is contained in. I don't want to die in a self-driving car because has superficial test coverage.
I just don't think this was the best way to go about your argument. Why attack the poster? Why not link your own papers about the benefits of unit tests?
I write unit tests, I tend to write a lot. They get added to as time goes on, bugs are found etc. (The case where he was suggesting feature only) When you have a lot of code coverage, and you find a bug (logic errors) your tests should either change due to change in expectations or they should be added to.
Mocks are really terrible for unit tests. Those tests then become integration tests, because you're not testing the unit of code, you're testing a system under test. The unit test should test the code contained. (Not if you can connect to a faked out driver to the DB) Use interfaces and stubs to avoid that.
That was because the justification came from a lack of discipline. I don't have the time to teach someone who doesn't want to do the work to test their code, nor do they care to understand why testing on those levels is good. I've been through these arguments with others about it. From my experience they never seem to catch on that the amount of combinations of testing is going to get tiresome very quickly, they're going to miss things in the tests, the coverage reports are more inflated than the VW scandal (yes.. instrumentation says that line was hit.. but was the value tested), and they leave around landmines for anyone else that touches that code later. If you take any stake in what Venkat says: Code quality is about how you feel about others. In their case: people who only write feature tests regularly leave their mess laying around and want other people to deal with it.
DHH is an example of this, in his talk he was upset about how much he had to verify the small details. (That's good, that he's upset, it shows that he should start looking at creating new libraries that isolate him from that. It's not a good reason to say "oh we don't need a test to verify `def add2(a:Int, b:Int)= a+b`)
If that's the case then on this forum please ignore them rather than attacking them. We were all inexperienced once.
I disagree slightly. If you're writing unit tests such that the code itself can't change without breaking the tests, perhaps the tests have been too tightly coupled.
The end goal of unit tests is to secure the expected functionality of the unit of code such that the tests will fail if the functionality stops working. But if changes to the code (refactoring, changing the ordering of certain lines, etc) causes the tests to break, then what was tested wasn't the functionality but the implementation. You might as well write a test that loads each source file as a string and verifies it equals what you've written.
An example: my favorite set of tests I've written lately were for some classes to handle deduplication of a large set of records. One implementation wrote to disk (for data sets too large to fit into memory) while the other did it purely in memory. Since their functionality was supposed to be identical, I wrote one set of tests that both ran on. The tests could not be specific to the code written because it had to work on two sets of code. If we wrote a third implementation, the same tests should work for it as well.
(But what you said about integration tests: 100% agree)
Sounds a bit like C#'s code contracts. A team I was on tried it some years ago, and I think they were a great substitute for a bulk of the unit tests we wrote. It didn't catch on, unfortunately.
In my experience the only time when unit tests were useful was when they were integration tests at the same time.
For example when writing a data structure or algoritms or programming language/compiler. There is a clear boundary to such a system and one can reason about it in terms of input and output black box. As such the test for this system is an integration test, but due to its isolation it looks so simple that someone would call it a unit test. Second example would be all TDD blog posts, where the author tests some trivial entity with perfect isolation. That confuses integration tests with unit tests again.
The biggest mistake people do when they write tests (or library functions) too early is that they miss the target. The first part of writing code is not to produce a clean and maintainable system, but to get the requirements right. Listening to people seldomly works, it is usually better to observe people interacting with some prototype and infer requirements from that. In words of Henry Ford, the goal is to find that the requirements are for a 'car' and not just for 'faster horses'. People who fossilize the requirements too early miss the target completely in my experience.
We get some bugs/issues that could be identified by better coverage, but most of our "bugs" stem from real world issues or differing views on functionality. Getting that right takes a ton of effort that simply isn't worth it right now.
I personally like the approach of writing a small amount of "obvious" tests to detect large functional breakage, and add a test for a bug once it's been identified aka a regression test.
Imo it's maximum test value per effort. Of course it depends on how impactful a bug could be.
This is only true for development environments which have particular weaknesses. If your tools for code changes and refactoring easily cover changing your unit tests while you are refactoring, then you can use unit tests for rapid development. The use of Unit Tests in present day "agile" methods actually originates in an environment where this was the norm. (Extreme Programming, as developed at the Chrysler C3 project in VisualWorks Smalltalk, using the phenomenal Refactoring Browser.)
Your statement above would be 180 degrees turned around in an environment like that.
If you start writing unit tests too early in the project, you're effecively locking down units of code which haven't yet proved themseves to be useful to your project.
Reading this just makes me sigh. Not all environments have the same cost/benefit factors in all parts of the edit/test/debug cycle. Not all environments and languages have the same costs for doing similar development operations. As a matter of fact, these are generally very, very different, even across projects using the same language. Change those cost/benefit factors, and the pronouncements you are making become more valid or less valid, depending.
Not understanding that those cost/benefit factors can change and change exactly what your best practices are -- this is one of the most pervasive big misunderstandings that fosters poor management of programming projects.
(EDIT: I think a lot of the annoyance many people have with "agile" methods, stems from practices which work well in one environment migrated to another environment where they don't work well at all!)
The major issue is no one ever learns about or talks about the "Unit" in unit tests. But a Unit is supposed to be a component, albeit a very small one, in OOP, that normally translates to a Class.
Your unit is a given Class, assuming its API doesn't change, the unit tests should continue working. If you havn't settled on an API for that unit yet, the tests are going to be coupled to a quick moving target, and that, in my opinion, is a waste of time.
That said, people don't really design their Classes as unit in a bigger set, with self-contained API and SOLID principles. That's the first mistake. And that then leads to tests which assert not a Unit, but some random piece of logic within a Unit, or against things that are not Units of their own, but already tightly coupled and leaky containers. Which ends up both with bad designed software that is inflexible and quick to rot, as well as poor tests which prevent you from making any meaningful design improvements to it, without all the tests breaking.
Good unit tests (like doctests) also provide superior documentation and examples.
Sure, worrying about documentation too soon is bad, but sometimes when doing early research, getting details clarified and written down is paramount and tests, maybe even unit tests, are a nice tool to have when you want it.
Who didn't want to test a private method before they see that a unit is more than a function?
When I am testing an application that is basically a layer between my app and a database, it does not make sense to mock the database. What then am I actually testing.
When I am testing a package that is pure logic, it makes sense to testing it in isolation.
When I am building a user interface that is in flux, the best tests are manual.
My definition of a unit test: The smallest piece of functionality that can be tested in isolation and provide value.
This does not test a damned thing: assert.equal(dbQuery, "SELECT * FROM users")
"Intuitively, one can view a unit as the smallest testable part of an application. In procedural programming, a unit could be an entire module, but it is more commonly an individual function or procedure. In object-oriented programming, a unit is often an entire interface, such as a class, but could be an individual method. Unit tests are short code fragments created by programmers or occasionally by white box testers during the development process. It forms the basis for component testing.
Ideally, each test case is independent from the others. Substitutes such as method stubs, mock objects, fakes, and test harnesses can be used to assist testing a module in isolation. Unit tests are typically written and run by software developers to ensure that code meets its design and behaves as intended."
It's false that the purpose of unit tests is to lock down the project's source code. Unit tests have multiple purposes, but that isn't one of them. The main purpose of unit tests is to allow developers to quickly reach confidence that code works, be it a code change or new code.
There is no such thing as writing unit tests too early.
Someone's implemented a package for doing it with Go which looks good: https://github.com/zimmski/go-mutesting
The problem I had is that there were many many mutations which led to code which was functionally identical, just a bit slower, or took a different route. You have to manually verify every mutation to check if this.
For one mutation it took me 30 minutes to convince myself the algorithm was still doing the same thing, just in a different way.
I usually use coverage as a tool to remind me of pieces of code I intended but forgot to test , so setting it to an arbitrary percentage is not that useful, in my opinion. If 100% is not feasible usually as well, that makes Mutation Testing in general not worth the effort.
Again, take my opinion with a grain of salt.
Often an integration test would catch the multiple mutations also being caught by (different) unit tests.
I assume that you mean that if a certain (broader) test kills the same mutants as X unit tests, those X tests are not really necessary?
I've looked into https://github.com/boxed/mutmut and https://github.com/sixty-north/cosmic-ray for Python project, and there it is only important that a mutant gets killed, but not how often and by which tests (therefore you can use `pytest -x` to continue with the next mutation after the first test failed due to it).
In my experience, you usually end up with much more coverage than you want or need.
That's completely backwards compared to how you should be doing things.
You either write the test:
1. to verify the existence of a bug by reproducing it, or
2. to formalize the spec for yet-to-be implemented code/feature.
And then you make the test green. Retroactively writing tests for working code, only to sabotage the code... Seems like an odd way of doing things.
e.g. I'm using a 3rd party transpiler / build tool, which I don't know all the details of, my test runner wouldn't do exactly the same transpliation of code this build tool, it took me about an hour to figure out where the build tool's config file was and how to get it working with the test runner.
Did I learn more about my tooling and code base? Yes. Was it useful? well maybe not as I might kill this project shortly anyway.
Vague/changing reqs is of course a separate problem.
I feel like that is a valid form of our craft, and it doesn't play well with TDD. It's not the primary way that I prefer to work, but I'm certain I'll be there later today (I've got a bastard of a sprint task waiting for me).
I do exploratory coding in a similar way, usually with a repl next to my text editor, and this first single file version is usually always trash but helps me understand the problem space. Then I write it "for real" with the knowledge I've gained.
You _could_ just assume your test is correctly written, but I personally prefer to be sure by seeing it red at least once.
I'm not arguing against red tests. I'm arguing for them.
What I'm arguing against is 1. writing the finished code, 2. writing green tests to "prove" the code correct, 3. trying to sabotage the finished code to "prove" that the tests works by making them red.
Sounds crazy? That was what OP said he was doing!
Anyway, we all agree testing is good. And that a test you've never seen fail is bad. Personally, I try to delete exploratory code and try to use tests to drive out the _real_ implementation, but I'm going to admit that at least sometimes I'll write code that's inherently "safe", and then add a test afterwards for garbage inputs, for example.
In that youre not trying to prove the tests “correct” though, but instead trying to prove good coverage.
I’d argue there are several ways to do that, and that’s certainly an interesting approach to the problem.
In either case you’ll have trouble being 100% confident though, so I guess it’s a matter of deciding when enough is enough.
IMO the best a approach is to first integrate the test coverage in the code reviews, cause there is no hard rule  and second write property-based tests .
(I wrote more about this here: https://vincenttunru.com/100-percent-coverage/)
I think this might apply when you write your own algorithms and want to test them. But if you are, like most of us, working on Business Logic, then you are probably writing the wrong tests.
We usually want to know whether a workflow or a customer story works as intended and as such we should write more Integration tests.
The idea itself has merit though and draws parallels to writing tests with variable or random input parameters, I am pretty sure there already was something on this.
while (car.wheels < 4)
There are some cases where it's not a big deal. '<' and '<=' can be interchangeable if one or both of the values are a rough estimate. One example would be comparing two images and doing something if they are a close match. It might not be too bad if you use `<= 0.995` instead of `< 0.995` when checking the percentage.
Some other examples might involve accelerometer data, pressure sensors, GPS boundaries, debounce algorithms, animations, etc. But I think it's just a good habit to write test cases that test those boundary values.
Conversely, I've worked in teams whose managers encouraged the developers to forgo tests for the sake of delivering faster. Never again.
Smart compilers and precise language (ala Haskell) obviate the need for writing unit tests.
Or am I missing something in the definition of code coverage?
I don't see anyone using exhaustive-input tests, which is usually impossible anyway, or splitting up each branch of conditionals into separate functions so as to make them unit testable.
I only see people splitting up the code into arbitrary function blocks, having at least one unit test for such function and then declaring all lines of code of that function as test-covered.