Hacker News new | more | comments | ask | show | jobs | submit login
99% code coverage (2017) (rachelcarmena.github.io)
100 points by fagnerbrack 37 days ago | hide | past | web | favorite | 97 comments



The industry is obsessed with getting 100% unit test code coverage even though it doesn't mean anything to the project. The purpose of unit tests is to lock down the project's source code once it's essentially completed; to avoid regressions when making minor changes.

If you start writing unit tests too early in the project, you're effecively locking down units of code which haven't yet proved themseves to be useful to your project. If you build square wheels for example, you may not realize that they're not designed correctly until you attach them to the car and realize that the car doesn't function well with them. It makes no sense to preemtively write unit tests for a component which has a very high likelihood of not being in its final desired state; you're just giving yourself more work to refactor the unit tests over and over; or worse, you're afraid of refactoring the tests and so you lie to yourself thinking that square wheels are fine.

Integration tests are by far the most useful tests to have at the beginning and middle stages of the project; integration tests allow you to keep your focus on the real goals of the project and not get stuck on designing the perfect square wheel.


It drives me nuts during interviews talking about test automation because people are very particular about the type of testing whether its unit testing, integration testing, acceptance testing, or whatever.

In my mind you only need 1 kind of testing: feature tests. Does the application provide the expected output for a given input and/or configuration. The application does all that it claims to do in a very precise way or it doesn't. Everything else is extraneous, though you may have to test the application in wildly different ways with wildly different means of automation to validate all feature support.

Feature tests are executed by running the application with a given input and configuration and comparing the result against a known expected output. Provided sufficient feature coverage this is enough to test for regression. While the feature tests are running each test runs against a clock so that dramatic swings in performance can be qualified with numbers. When that is not enough, as in the case of accessibility, have people (actual users) perform feature tests.

Other problems like bad design, redundant features, or poor code quality are qualified using code reviews and validation tools, like a linter.


I think your argument is missing the "cost" of feature tests, which is that they necessarily must run much slower than tests that are only testing a small unit of code.

For example, I worked on payroll software at some point. After finding a bug, we'd want to ensure that could never happen again, and would want to add a test for it going forward. So for example, I may have needed to write a test around someone who worked in one city in Ohio, lived in another, previous income above some amount, and with certain tax advantaged benefits. Setting up this test via feature testing is certainly possible, but it's likely the test itself will take a significant amount of wall time to execute. It's way faster to just test the payroll calculation code, which means you can run the tests more often, and with less developer inconvenience.


This is why you actually need a mix of feature tests and unit tests.

Have automated feature tests that test most of the "happy path" run-throughs of features that your users do on the front end, and then unit tests that test the minutiae.

My favorite project that I ever worked on was one that was set up this way. We had literally hundreds of feature tests, and thousands of unit tests. When running in a single thread on a local machine, it would take two hours, but when running on Circle CI with parallelization, it would run the entire suite in 6 minutes.

This enabled us to release features to production with a high amount of confidence, any time we wanted.

It was not uncommon for us to release bug fixes to production, while still on the phone with the complaining customer. We earned a lot of customer loyalty points any time we pulled that off. And the best part was that because we had the massive feature test suite running as part of of CI/CD, we were able to do that with the same amount of confidence that we would have compared to having a massive QA team with 50 people testing our complete app before every deployment. It was awesome.


This is my preference too - mainly feature tests, and unit and integration tests where they provide enough value to be worthwhile. I generally end up with 70-75% test coverage, but I don't pay too much attention to the actual number.

We just released a fairly large system that took this testing approach - despite a large number of tests hitting a database, file system and a blob storage emulator, they still completed in only 5 minutes or so on our CI machine (or 1 minute locally on very beefy laptops).


> but it's likely the test itself will take a significant amount of wall time to execute.

Why? What about your application changed so that it is slower when testing compared to real world use? If anything it should be dramatically faster because people don't provide microsecond accurate automated responses. If the application naturally executes very quickly I would imagine it would take far long to set up the test scenario than to execute against it.

In my own applications if they take more than two seconds to deliver a response (even for 5mb input) then at the very least I have a critical performance defect. There are not many administrative tasks I can complete from start to finish in that frame of time.


Sure, let's go back to my payroll example.

I need to make 8 HTTP requests to properly setup my test from a clean slate, including setting up the deductions, previous earnings, company location and employee home address. If each of those requests takes 50ms, and then I need to make a request to actually execute my test, and verify everything went well, that could easily be 500ms.

And that's just what needs to happen to run one end to end test of the payroll calculation feature. It's absolutely valuable to have a few such tests, but I'm not going to run tests for all the weird one off scenarios (like what happens if someone lives in NJ, and works in Yonkers, and the company has a location in NJ as well?) because they'd take minutes to run. I could run that entire test as a unit test in 2ms for the payroll calculation. That lets me run all of the tests I need very quickly.


can't you run these feature tests in parallel? I'm runnning a system where it full on downloads and mounts containers, and executes them (in a loopback, they get hosted by transient webservers on the test host aka my laptop, but possibly also travis), about 50 of them in parallel with tons of database calls, and it usually takes around 10-20s to complete.


It only takes you 20s to start up 50 containers? I doubt I could start 50 alpine containers in that time on my company issue 2015 MBP (Edit: A quick test reveals my personal desktop can in fact accomplish this, but it's much more powerful than my work MBP and has no virtualization overhead since it's running native docker instead of docker for mac).

I don't know about jeffasinger's company, but in optimal circumstances, a test instance of our app running against a in memory h2 DB takes 3 minutes to start. The app internally already performs heavy lifting in parallel, so it's also not clear that running multiple instances of the app will make it that much faster...


Singularity containers, my dev laptop is Linux, way less overhead than docker on mac. I'm simulating request load on a scheduler for compute jobs. In principle those jobs are not generally local, but it helps to stresstest the scheduler.


I'll give you one real example: because my application is some embedded software that has to be built, downloaded onto the target and then run. That is the only environment the real application runs in. Some tests should be run at this level, but the cost to automate and the benefit means that you usually don't try and get full coverage.

You can build a desktop version of the software with hardware mocked out, which is very useful, but then you're doing something much more akin to an integration test.


Unless the software is firmware or an OS it rarely directly touches the hardware. It only knows of the hardware available because the firmware or OS provide that information. I have seen a lot of organizations fake this with an array of VMs or when that is not enough they have a bunch of spare hardware to test against, like a whole bunch of different models of Android phones.


> In my own applications if they take more than two seconds to deliver a response (even for 5mb input) then at the very least I have a critical performance defect.

An entire unit test suite should complete in well under 2 seconds.


That depends on the application, what you are testing, and how many test units there are. As absurd as it sounds I have seen people force test units on logic that does nothing more than validate the presence of testing, which is amazingly fast.


> What about your application changed so that it is slower when testing compared to real world use?

Real world use hits an expensive optimized production cluster. Tests run on my laptop or a cheap testing machine.


We have 1000s of features.

Real users don't use every feature every time they perform an action.

A test case at the feature level will likely require several requests if its starting from clean state.


> A test case at the feature level will likely require several requests if its starting from clean state.

You say request so I guess you are talking about a network service request. When I first got into A/B testing many years ago we found we could use it to simulate traffic through the entirety of the Travelocity web site from homepage to checkout and back. They never used the A/B test system for test automation, but they could have. Back then the checkout pages were rarely tested with valid feature tests because they were at the end of the path. There were plenty of unit tests isolated to small bits of functionality at those pages. We discovered, thanks to session recording with things like Tea Leaf and Clicktale, just a bit later that those unit tests were worthless. The checkout pages failed all the time and cost the company tremendously in lost revenue. Some of those failures would show up in our A/B test experiments though.

My lesson learned from this is that testing things in isolation isn't reliable and provides false confidence. You tend to get back the answer you wanted such that is amounts to a self-fulling prophecy opposed to a criticism.


jeffasinger explained this point better than me, but one thing to point out is that I'm not arguing for unit tests to the exclusion of feature tests.

Rather, I'm arguing against feature tests to the exclusion of unit tests. Both are required in a decently tested systems.

Feature tests provide the assurance that the application works as expected and prevent broken changes going to production, as you've pointed out.

However, they're not a practical replacement for the instant feedback that unit tests provide. Our feature tests currently take 1.5->2.5 hours to run depending on CI system load.

That's... not great. On a given workday I get to try 3-4 builds? I don't know about how others work, but personally I like to get some feedback every like 10-25 lines of code. Unit tests provide that. They take 2 minutes to run. If I had to wait 1.5 hours every 25 lines of code, I would sure be much less productive.

Lets say I somehow convinced management to pay for 20x as many compute resources for CI so that I could get those results in 5-10 minutes rather than 1.5->2.5 hours. And 5-10 is the best case, as that's pretty close to the overhead for our CI system to just clone the code and get the secrets to deploy to a test environment and actually deploy to said test environment.

That's still my flow broken every time I need a test run, because I'm not going to sit and stare at build for 10 minutes, I'm going to answer that email from my product manager or review a team member's PR. If that build comes back with a failure, then I'm context switching back to actually fix it.

The point is that this is a funnel. The unit tests are for instant feedback and to catch obvious mistakes, like "Hey, that value might not be defined, you still need to calculate value B, hope you didn't remove/break the fallback calculation". This ensures that less mistakes incur the full 2.5 hour penalty of a CI build that ultimately fails a test. The feature tests are there for safety. The unit tests are there for productivity. The feature tests are "Does this feature work so we can deploy it". In theory you have at most 1-2 runs of these per feature as unit tests _should_ catch issues before they get there. They're simply too expensive in terms of time to be your first layer of defense, rather they're your final layer.


In a complex system a feature test can detect a problem but not diagnose it. For that a unit test is desirable.

The hole you get into is, lots of units are changed and a feature quits working and everybody says 'its not me!' and it doesn't get fixed.


> and everybody says 'its not me!' and it doesn't get fixed.

Test automation is not a remedy to prop up broken leadership. I would appoint an arbitrary owner of the defect and that person would visit with other people as necessary to remove or dismiss various functionality from blame one by one. Once the appropriate collision of changes is discovered it will become more clear how to address resolution.


...and they would do that by writing individual unit tests.


Your development process should not allow someone to merge a change that fails the test suite, nor one that is too large to reason about.


I tend to agree, but in practice this doesn't work well if the feature is slow.

I was a tester on a project where the primary functionality was a workflow that took 40 minutes minimum to complete, often much longer (it involved migrating servers from one cloud provider to another), and the dev team had this same philosophy: no writing unit tests, just acceptance/feature tests run against live environments, which seemed okay - it was the start of the project, a new codebase, and it probably worked out well on a previous project.

But it was kind of a disaster on this project. With only feature testing: a dev makes a change, updated code is deployed for test, automated acceptance tests kick off and an hour later the tests fail, then they look through the app logs and finally discover a NoneType error or a KeyError or something (Python's dynamic typing did not help either). Then they fix the code and repeat the whole process. Hours later you've got one change tested and merged. Then it goes out to a pre-prod environment for in-depth feature testing with more permutations of inputs, and we'd find trivially broken code there too.

That's a worst case scenario, but it happened all the time. It just took way too long to run all the feature tests needed to get good coverage of code paths. We'd literally waste full days trying to get something out to production because of the slow dev/test cycle. Eventually, the devs caved and added unit tests because we ran into so much broken code.

I never appreciated unit tests much until after that project, especially for dynamically typed languages. It's the quickest and easiest way to weed out trivial issues in the code - human eyes will not catch everything in peer review and feature tests with a sufficient number of input permutations can take way too long to run.


Well, while I can somewhat agree with your point; it sounds like you are looking at it from a business standpoint. Someone who will be working with the details of the code is going to appreciate unit tests, not only because it verifies all the tiny pieces of code the person wrote. But also because it makes it very easy to track down where issues might be if a feature is broken.

I dont agree with the "100% test coverage" mentality, but I think there is value with unit and integration tests.


Please tell me all of the products that your code is contained in. I don't want to die in a self-driving car because has superficial test coverage. (Well it passed it's feature tests that you wrote for it)

There are very specific reasons for each level of testing and the benefits stack.


>Well it passed it's feature tests that I wrote for it

You (the programmer) write unit tests too, and even well code with well written unit tests will have bugs in it, if simply due to logic errors, or if due to mocks.

>Please tell me all of the products that your code is contained in. I don't want to die in a self-driving car because has superficial test coverage.

I just don't think this was the best way to go about your argument. Why attack the poster? Why not link your own papers about the benefits of unit tests?


A few things: First

I write unit tests, I tend to write a lot. They get added to as time goes on, bugs are found etc. (The case where he was suggesting feature only) When you have a lot of code coverage, and you find a bug (logic errors) your tests should either change due to change in expectations or they should be added to.

On Mocks

Mocks are really terrible for unit tests. Those tests then become integration tests, because you're not testing the unit of code, you're testing a system under test. The unit test should test the code contained. (Not if you can connect to a faked out driver to the DB) Use interfaces and stubs to avoid that.

The attack

That was because the justification came from a lack of discipline. I don't have the time to teach someone who doesn't want to do the work to test their code, nor do they care to understand why testing on those levels is good. I've been through these arguments with others about it. From my experience they never seem to catch on that the amount of combinations of testing is going to get tiresome very quickly, they're going to miss things in the tests, the coverage reports are more inflated than the VW scandal (yes.. instrumentation says that line was hit.. but was the value tested), and they leave around landmines for anyone else that touches that code later. If you take any stake in what Venkat says: Code quality is about how you feel about others. In their case: people who only write feature tests regularly leave their mess laying around and want other people to deal with it.

DHH is an example of this, in his talk he was upset about how much he had to verify the small details. (That's good, that he's upset, it shows that he should start looking at creating new libraries that isolate him from that. It's not a good reason to say "oh we don't need a test to verify `def add2(a:Int, b:Int)= a+b`)


> I don't have the time to teach someone who

If that's the case then on this forum please ignore them rather than attacking them. We were all inexperienced once.


They're never going to learn if no one ever says anything about a bad practice.


That's true, but people also don't learn from (and none of us enjoy) being attacked for what they've said. Constructive, positive explanations are of more value ("x has the following benefits because..." rather than "y is bad. full stop").


I would imagine feature tests for a self-driving car would be everything from testing functionality of the radio to having the car parallel park or back into a parking spot while towing a trailer.


Good luck on ever getting a test on the ABS breaks on that. But hey the radio, and parallel park works. The back into a parking spot is a good example of how bikesheding can happen even in testing. (You shouldn't be towing with a car, they're not a good fit for that)


> The purpose of unit tests is to lock down the project's source code

I disagree slightly. If you're writing unit tests such that the code itself can't change without breaking the tests, perhaps the tests have been too tightly coupled.

The end goal of unit tests is to secure the expected functionality of the unit of code such that the tests will fail if the functionality stops working. But if changes to the code (refactoring, changing the ordering of certain lines, etc) causes the tests to break, then what was tested wasn't the functionality but the implementation. You might as well write a test that loads each source file as a string and verifies it equals what you've written.

An example: my favorite set of tests I've written lately were for some classes to handle deduplication of a large set of records. One implementation wrote to disk (for data sets too large to fit into memory) while the other did it purely in memory. Since their functionality was supposed to be identical, I wrote one set of tests that both ran on. The tests could not be specific to the code written because it had to work on two sets of code. If we wrote a third implementation, the same tests should work for it as well.

(But what you said about integration tests: 100% agree)


IMHO unit tests are replacements for an interface contract specification language. You encode the requirements for the interface of your unit (whatever that may be) into tests. Writing unit tests early means that you think about the interface early. This way you can find awkward or wrong requirements early. Of course they're no replacement for integration tests, but they're supposed to be much cheaper to write, to run, and to change than integration tests.


> interface contract specification language

Sounds a bit like C#'s code contracts. A team I was on tried it some years ago, and I think they were a great substitute for a bulk of the unit tests we wrote. It didn't catch on, unfortunately.


More generally, the idea that you can conquer complexity by writing only small components overlooks the fact that the complexity is found in the interactions of the components. Modularity is necessary, but not sufficient, for success in software development.


Very good comment about SW testing in general.

In my experience the only time when unit tests were useful was when they were integration tests at the same time.

For example when writing a data structure or algoritms or programming language/compiler. There is a clear boundary to such a system and one can reason about it in terms of input and output black box. As such the test for this system is an integration test, but due to its isolation it looks so simple that someone would call it a unit test. Second example would be all TDD blog posts, where the author tests some trivial entity with perfect isolation. That confuses integration tests with unit tests again.

The biggest mistake people do when they write tests (or library functions) too early is that they miss the target. The first part of writing code is not to produce a clean and maintainable system, but to get the requirements right. Listening to people seldomly works, it is usually better to observe people interacting with some prototype and infer requirements from that. In words of Henry Ford, the goal is to find that the requirements are for a 'car' and not just for 'faster horses'. People who fossilize the requirements too early miss the target completely in my experience.


I agree. The project I'm working on hovers around 70% coverage. As far as I can tell, we'd likely loose productivity pushing for any better coverage.

We get some bugs/issues that could be identified by better coverage, but most of our "bugs" stem from real world issues or differing views on functionality. Getting that right takes a ton of effort that simply isn't worth it right now.


Yep, testing definitely has a hugely decreasing marginal value.

I personally like the approach of writing a small amount of "obvious" tests to detect large functional breakage, and add a test for a bug once it's been identified aka a regression test.

Imo it's maximum test value per effort. Of course it depends on how impactful a bug could be.


The purpose of unit tests is to lock down the project's source code once it's essentially completed; to avoid regressions when making minor changes.

This is only true for development environments which have particular weaknesses. If your tools for code changes and refactoring easily cover changing your unit tests while you are refactoring, then you can use unit tests for rapid development. The use of Unit Tests in present day "agile" methods actually originates in an environment where this was the norm. (Extreme Programming, as developed at the Chrysler C3 project in VisualWorks Smalltalk, using the phenomenal Refactoring Browser.)

Your statement above would be 180 degrees turned around in an environment like that.

If you start writing unit tests too early in the project, you're effecively locking down units of code which haven't yet proved themseves to be useful to your project.

Reading this just makes me sigh. Not all environments have the same cost/benefit factors in all parts of the edit/test/debug cycle. Not all environments and languages have the same costs for doing similar development operations. As a matter of fact, these are generally very, very different, even across projects using the same language. Change those cost/benefit factors, and the pronouncements you are making become more valid or less valid, depending.

Not understanding that those cost/benefit factors can change and change exactly what your best practices are -- this is one of the most pervasive big misunderstandings that fosters poor management of programming projects.

(EDIT: I think a lot of the annoyance many people have with "agile" methods, stems from practices which work well in one environment migrated to another environment where they don't work well at all!)


Very well said, I'm in your camp as well.

The major issue is no one ever learns about or talks about the "Unit" in unit tests. But a Unit is supposed to be a component, albeit a very small one, in OOP, that normally translates to a Class.

Your unit is a given Class, assuming its API doesn't change, the unit tests should continue working. If you havn't settled on an API for that unit yet, the tests are going to be coupled to a quick moving target, and that, in my opinion, is a waste of time.

That said, people don't really design their Classes as unit in a bigger set, with self-contained API and SOLID principles. That's the first mistake. And that then leads to tests which assert not a Unit, but some random piece of logic within a Unit, or against things that are not Units of their own, but already tightly coupled and leaky containers. Which ends up both with bad designed software that is inflexible and quick to rot, as well as poor tests which prevent you from making any meaningful design improvements to it, without all the tests breaking.


> The purpose of unit tests is to lock down the project's source code once it's essentially complete

Good unit tests (like doctests) also provide superior documentation and examples.

Sure, worrying about documentation too soon is bad, but sometimes when doing early research, getting details clarified and written down is paramount and tests, maybe even unit tests, are a nice tool to have when you want it.


unit tests is not only to catch regression and it does not lock down anything. I found that doing TDD actually helps you design better, it naturally force you to write easily testable code which translate into more readable and maintainable code. Also well written UT prove very useful as living documentation. When you put the same care in them as you do in production code, you can very easily add features not needing to reverse-engineer the all thing. UT is one of thing things that may seem counter-intuitive but if you stick to it for a little while you see lots of positive effects. A bit like forcing yourself to hold the neck properly when learning the guitar, it slows you down at 1st, hurts a little, but if you do it you'll play better.


What is an unit? Is it the wheel or the car? When you start to see that the car is the unit the difference between an unit test and an integration test gets blurry.


Unit tests should test individual subcomponents of your system in isolation. If your project is to build a car from scratch, then the unit cannot be the car. If your project is to build an autonomous fleet of self-driving cars, then from your project's perspective, the car could be a unit; the project to build the autonomous car would have different units from the project which manages the fleet of cars.


Exactly, context matter. Most cases when some is writing for or against unit testing they miss the part about finding what they define as an unit. If the class need other classes to work, the whole set can be seen as one unit.

Who didn't want to test a private method before they see that a unit is more than a function?


I like to call it the 'system under test' / SUT. Then I can go on to say a unit test is a test that's fast and if it fails points to a specific problem in the SUT. Smaller SUTs obviously help satisfy both. If setting up collaborators in a bigger SUT leads to either unacceptable runtimes or too much brittleness from many things being able to go wrong, it's a signal you're not getting the value of unit testing in that place. (The test itself may or may not be valuable, other types of testing have their own value propositions.)


This means that the unit tests will miss a lot of possible errors. Nothing wrong with that, just use integration testing or functional tests to cover the other part.


It does not matter. Are your tests preventing defects? Is it possible to run new or affected tests quickly? Are your tests reasonably easy to maintain?

When I am testing an application that is basically a layer between my app and a database, it does not make sense to mock the database. What then am I actually testing.

When I am testing a package that is pure logic, it makes sense to testing it in isolation.

When I am building a user interface that is in flux, the best tests are manual.

My definition of a unit test: The smallest piece of functionality that can be tested in isolation and provide value.

This does not test a damned thing: assert.equal(dbQuery, "SELECT * FROM users")


The basic rule I go by is whether or not the test is self-contained. For example if you are making a trip to a database or hitting a 3rd party dll then it’s an integration test. I thought this was pretty widely accepted.


Wikipedia disagrees with that definition:

"Intuitively, one can view a unit as the smallest testable part of an application. In procedural programming, a unit could be an entire module, but it is more commonly an individual function or procedure. In object-oriented programming, a unit is often an entire interface, such as a class, but could be an individual method.[2] Unit tests are short code fragments[3] created by programmers or occasionally by white box testers during the development process. It forms the basis for component testing.[4]

Ideally, each test case is independent from the others. Substitutes such as method stubs, mock objects,[5] fakes, and test harnesses can be used to assist testing a module in isolation. Unit tests are typically written and run by software developers to ensure that code meets its design and behaves as intended."

https://en.wikipedia.org/wiki/Unit_testing


I don’t think that definition necessarily is in conflict with my point.


I would say that for most CRUD applications it is much easier to have the tests run against an sqlite database instead of mocking it, so would let unit tests update the database and write tests to check if the data in the database match what I would expect from the input. If you change how you store the data, it isn't a side effect of changes you are doing but the main effect so the test could be rewritten anyway. Some call that an integration test


Mocking is a last resort and usually the wrong choice and used incorrectly. In memory fakes like sqlite (or better, running your real database in a small local inmemory footprint) is usually the way to get fast, correct, stable tests.


I wouldn’t mock a data layer unless I was unit testing (for example) response codes on a route depending on the data layer. In those cases you aren’t interested in the data as much as how it is handled. I like to be able to point an integration test at a “clean” database in order to test a data layer. In this way you can have separate tests to basically ensure your API responses and database queries are all valid given your database is in a valid state and the API request is valid. With this kind of setup I rarely actually have to run an application anymore to feel it is in good working condition, and that saves me a LOT of time.


Mocking is a last resort and usually the wrong choice and used incorrectly. In memory fakes like sqlite (or better, running your real database in a small local inmemory footprint) is usally the way to get fast, correct, stable tests.


It's true that obsessing over 100% code coverage is a bad thing.

It's false that the purpose of unit tests is to lock down the project's source code. Unit tests have multiple purposes, but that isn't one of them. The main purpose of unit tests is to allow developers to quickly reach confidence that code works, be it a code change or new code.

There is no such thing as writing unit tests too early.


Mutation testing is a neat idea I'd not heard of. Wonder how well it works in practice.

Someone's implemented a package for doing it with Go which looks good: https://github.com/zimmski/go-mutesting


I tried it. If you had a super serious algorithm where you were will to spend any money to test it (a central security algorithm for example), it might be worth it.

The problem I had is that there were many many mutations which led to code which was functionally identical, just a bit slower, or took a different route. You have to manually verify every mutation to check if this.

For one mutation it took me 30 minutes to convince myself the algorithm was still doing the same thing, just in a different way.


I haven't worked with it extensively, but from what I've seen, it's far beyond what most teams should consider. It's basically a more thorough method of measuring test coverage, pointing out cases that have not been covered by your tests yet. However, the number of tests that it would have you write to reach 100% mutation coverage is not justified by the number of bugs you'll catch, unless the impact of any bug is very high (I suppose at NASA?). In fact, the amount of work required to even check which tests you missed is already not justified.

I usually use coverage as a tool to remind me of pieces of code I intended but forgot to test [1], so setting it to an arbitrary percentage is not that useful, in my opinion. If 100% is not feasible usually as well, that makes Mutation Testing in general not worth the effort.

Again, take my opinion with a grain of salt.

[1] https://vincenttunru.com/100-percent-coverage


I think mutation testing really shines in code bases that are already heavily tested, because it let's you discover test cases that you don't actually need. Tests are a burden since you have to adapt them when you change the behavior. With mutation testing you can prune your test code by identifying tests that test very similar behavior.


Would that be done based on which (different) tests killed a particular mutant?

Often an integration test would catch the multiple mutations also being caught by (different) unit tests.

I assume that you mean that if a certain (broader) test kills the same mutants as X unit tests, those X tests are not really necessary?

I've looked into https://github.com/boxed/mutmut and https://github.com/sixty-north/cosmic-ray for Python project, and there it is only important that a mutant gets killed, but not how often and by which tests (therefore you can use `pytest -x` to continue with the next mutation after the first test failed due to it).


Me neither, I liked the concept of mutation testing. (I was using this performing changes manually, without knowing this technique has a nice name). I would appreciate if somebody points out a mutation framework /tools for .net


https://github.com/fscheck/FsCheck is something I've used very briefly in the past. I did more work with this sort of thing in Scala.

In my experience, you usually end up with much more coverage than you want or need.


There's something that used to be called Pex, and now is apparently called IntelliTest [1], but I believe it's only available in the Enterprise edition of Visual Studio.

[1] https://docs.microsoft.com/en-us/visualstudio/test/intellite...


I hadn't realized this mutation testing existed as automated tooling. I'll be looking more into it. Traditionally, I've gone with "sabotaging" the code when writing unit tests; altering the code to verify a test goes from red to green or vise versa. Never trust a test that has never failed.


> Traditionally, I've gone with "sabotaging" the code when writing unit tests; altering the code to verify a test goes from red to green or vise versa. Never trust a test that has never failed.

That's completely backwards compared to how you should be doing things.

You either write the test:

1. to verify the existence of a bug by reproducing it, or

2. to formalize the spec for yet-to-be implemented code/feature.

And then you make the test green. Retroactively writing tests for working code, only to sabotage the code... Seems like an odd way of doing things.


Some people feel more comfortable writing code without tests first, gradually shaping final design of the interface. Especially if the tests involve a lot of stubbing and mocking of things like Redis and Sphinx Search, or messing with crypto tokens, parsing HTML, freezing time, setting up global config attributes like support emails, interaction between 2 different databases, etc. Tests are code as well and oftentimes might feel heavier than production code. You can of course say that there is a way things should be no matter what but that might lead to negative emotions, toxicity and complaining instead of getting stuff done.


Agree, actually designing & implementing good tests requires a lot of effort. It's rarely wasted effort but if you really need something out the door now it can distract from short term delivery.

e.g. I'm using a 3rd party transpiler / build tool, which I don't know all the details of, my test runner wouldn't do exactly the same transpliation of code this build tool, it took me about an hour to figure out where the build tool's config file was and how to get it working with the test runner.

Did I learn more about my tooling and code base? Yes. Was it useful? well maybe not as I might kill this project shortly anyway.


That sounds like a complicated workaround for bad libraries (Redis does support unit tests?) and bad apply design with strong coupling between remote components.


You don't always write a test first. A bug? Sure, write a test that shows the bug, fix it, make it green. A very well defined something? Sure, TDD. Many times, you will write code (or come across pre-existing code) and tests come last.


Yep, TDD is quite hard when requirements are vague/changing... the reqs changed? OK now I have to rewrite my test AND the code that passes it.

Vague/changing reqs is of course a separate problem.


While in principle I agree with you, there is something to be said for 'explorative' programming- where I don't quite know how I'm going to build this feature, or I don't quite know just what the refactoring will look like, etc. Most often it's that I don't fully understand the codebase I'm working with yet, and the tests that exist are old and crusty. I want to play a little while first.

I feel like that is a valid form of our craft, and it doesn't play well with TDD. It's not the primary way that I prefer to work, but I'm certain I'll be there later today (I've got a bastard of a sprint task waiting for me).


I might be wrong but aren't these called spikes?

I do exploratory coding in a similar way, usually with a repl next to my text editor, and this first single file version is usually always trash but helps me understand the problem space. Then I write it "for real" with the knowledge I've gained.


It's entirely possible to write code that passes an as yet tested desired piece of functionality. If you want to make sure that piece of functionality doesn't get rewritten out later, you probably still ought to write a test. This test never fails though, unless you deliberately force it to.

You _could_ just assume your test is correctly written, but I personally prefer to be sure by seeing it red at least once.


I think you misunderstand.

I'm not arguing against red tests. I'm arguing for them.

What I'm arguing against is 1. writing the finished code, 2. writing green tests to "prove" the code correct, 3. trying to sabotage the finished code to "prove" that the tests works by making them red.

Sounds crazy? That was what OP said he was doing!


It's quite possible that he's testing after. It's implied, but it isn't necessarily the case.

Anyway, we all agree testing is good. And that a test you've never seen fail is bad. Personally, I try to delete exploratory code and try to use tests to drive out the _real_ implementation, but I'm going to admit that at least sometimes I'll write code that's inherently "safe", and then add a test afterwards for garbage inputs, for example.


I'm pretty sure everyone understood. There are many ways write code.


How do you know you have full test coverage if you don't mutate your code?


That’s a fair point. I don’t.

In that youre not trying to prove the tests “correct” though, but instead trying to prove good coverage.

I’d argue there are several ways to do that, and that’s certainly an interesting approach to the problem.

In either case you’ll have trouble being 100% confident though, so I guess it’s a matter of deciding when enough is enough.


I had a similar experience, you always get what you measure.

IMO the best a approach is to first integrate the test coverage in the code reviews, cause there is no hard rule [0] and second write property-based tests [1].

[0] http://www.se-radio.net/2018/05/se-radio-episode-324-marc-ho...

[1] Sample framework for JavaScript https://github.com/jsverify/jsverify


I prefer just setting coverage requirements to 100%, and give developers the freedom to mark code as ignored for coverage (e.g. /* istanbul ignore next */ for many Javascript applications). That way, the annotation is something that can be brought up during code reviews in case the reviewer does not agree with that not being covered, and it doesn't depend on the reviewer having to remember to run or look at a separate coverage report.

(I wrote more about this here: https://vincenttunru.com/100-percent-coverage/)


I like the idea of property based testing, but have found that it obfuscated the test code, making it much harder to read (especially for new teammates). This was in Go, so maybe it had a lot more cruft around it than it would in something more functional or js.


Does anyone have experience with https://github.com/stryker-mutator/stryker for code mutation testing?


The author suggests to add mutations to the code, like replacing '<' with '<=' in order to "test the tests". He assumes that tests should fail due to such small differences.

I think this might apply when you write your own algorithms and want to test them. But if you are, like most of us, working on Business Logic, then you are probably writing the wrong tests.

We usually want to know whether a workflow or a customer story works as intended and as such we should write more Integration tests.

The idea itself has merit though and draws parallels to writing tests with variable or random input parameters, I am pretty sure there already was something on this.


You think customers aren't affected if you mistakenly replace a < with a <= in your code?


  while (car.wheels < 4)
    car.attach(new Wheel);


'<' versus '<=' can be very important, and it can cause severe bugs when dealing with things like array indices or financial data.

There are some cases where it's not a big deal. '<' and '<=' can be interchangeable if one or both of the values are a rough estimate. One example would be comparing two images and doing something if they are a close match. It might not be too bad if you use `<= 0.995` instead of `< 0.995` when checking the percentage.

Some other examples might involve accelerometer data, pressure sensors, GPS boundaries, debounce algorithms, animations, etc. But I think it's just a good habit to write test cases that test those boundary values.


Basically, it depends on whether we're dealing with floats or integers.


Thanks for the feedback. The post only answers "Do we have a good safety net to change this legacy code?". Mutation testing is not a new concept. It's usually used in critical systems: the system is mutated and you measure the time for detecting the mutation and the time for solving the problem.


I can't imagine, as a developer, writing tests without assertions. Any developer who does that is actively making the situation worse for the team and the application.

Conversely, I've worked in teams whose managers encouraged the developers to forgo tests for the sake of delivering faster. Never again.


Some tools for mutation testing in Python:

- https://github.com/boxed/mutmut

- https://github.com/sixty-north/cosmic-ray


You can do that, or you can let priority of business requirements drive your testing efforts. For each project, I have a certain set of hard requirements that will never be used in the field. We deliver these half-assed and fix bugs if 2 years later someone uses the feature by accident.


Use a language that doesn’t allow incorrect code and you get 100% code coverage without writing tests.

Smart compilers and precise language (ala Haskell) obviate the need for writing unit tests.


Who is going to test that the mutators are doing the right kind of mutations? lol I’ve worked on code that has 1000’s of tests and is still a bug ridden hell.


Those tools also have tests.. :)


I always achieve 100% relevant code coverage. I just set up a test with one input which tests my main(), which then obviously calls all the relevant parts of the code.

Or am I missing something in the definition of code coverage?

I don't see anyone using exhaustive-input tests, which is usually impossible anyway, or splitting up each branch of conditionals into separate functions so as to make them unit testable.

I only see people splitting up the code into arbitrary function blocks, having at least one unit test for such function and then declaring all lines of code of that function as test-covered.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: