
99% code coverage (2017) - fagnerbrack
https://rachelcarmena.github.io/2017/09/01/do-we-have-a-good-safety-net-to-change-this-legacy-code.html
======
cryptica
The industry is obsessed with getting 100% unit test code coverage even though
it doesn't mean anything to the project. The purpose of unit tests is to lock
down the project's source code once it's essentially completed; to avoid
regressions when making minor changes.

If you start writing unit tests too early in the project, you're effecively
locking down units of code which haven't yet proved themseves to be useful to
your project. If you build square wheels for example, you may not realize that
they're not designed correctly until you attach them to the car and realize
that the car doesn't function well with them. It makes no sense to preemtively
write unit tests for a component which has a very high likelihood of not being
in its final desired state; you're just giving yourself more work to refactor
the unit tests over and over; or worse, you're afraid of refactoring the tests
and so you lie to yourself thinking that square wheels are fine.

Integration tests are by far the most useful tests to have at the beginning
and middle stages of the project; integration tests allow you to keep your
focus on the real goals of the project and not get stuck on designing the
perfect square wheel.

~~~
austincheney
It drives me nuts during interviews talking about test automation because
people are very particular about the type of testing whether its unit testing,
integration testing, acceptance testing, or whatever.

In my mind you only need 1 kind of testing: feature tests. Does the
application provide the expected output for a given input and/or
configuration. The application does all that it claims to do in a very precise
way or it doesn't. Everything else is extraneous, though you may have to test
the application in wildly different ways with wildly different means of
automation to validate all feature support.

Feature tests are executed by running the application with a given input and
configuration and comparing the result against a known expected output.
Provided sufficient feature coverage this is enough to test for regression.
While the feature tests are running each test runs against a clock so that
dramatic swings in performance can be qualified with numbers. When that is not
enough, as in the case of accessibility, have people (actual users) perform
feature tests.

Other problems like bad design, redundant features, or poor code quality are
qualified using code reviews and validation tools, like a linter.

~~~
jeffasinger
I think your argument is missing the "cost" of feature tests, which is that
they necessarily must run much slower than tests that are only testing a small
unit of code.

For example, I worked on payroll software at some point. After finding a bug,
we'd want to ensure that could never happen again, and would want to add a
test for it going forward. So for example, I may have needed to write a test
around someone who worked in one city in Ohio, lived in another, previous
income above some amount, and with certain tax advantaged benefits. Setting up
this test via feature testing is certainly possible, but it's likely the test
itself will take a significant amount of wall time to execute. It's way faster
to just test the payroll calculation code, which means you can run the tests
more often, and with less developer inconvenience.

~~~
austincheney
> but it's likely the test itself will take a significant amount of wall time
> to execute.

Why? What about your application changed so that it is slower when testing
compared to real world use? If anything it should be dramatically faster
because people don't provide microsecond accurate automated responses. If the
application naturally executes very quickly I would imagine it would take far
long to set up the test scenario than to execute against it.

In my own applications if they take more than two seconds to deliver a
response (even for 5mb input) then at the very least I have a critical
performance defect. There are not many administrative tasks I can complete
from start to finish in that frame of time.

~~~
jeffasinger
Sure, let's go back to my payroll example.

I need to make 8 HTTP requests to properly setup my test from a clean slate,
including setting up the deductions, previous earnings, company location and
employee home address. If each of those requests takes 50ms, and then I need
to make a request to actually execute my test, and verify everything went
well, that could easily be 500ms.

And that's just what needs to happen to run one end to end test of the payroll
calculation feature. It's absolutely valuable to have a few such tests, but
I'm not going to run tests for all the weird one off scenarios (like what
happens if someone lives in NJ, and works in Yonkers, and the company has a
location in NJ as well?) because they'd take minutes to run. I could run that
entire test as a unit test in 2ms for the payroll calculation. That lets me
run all of the tests I need very quickly.

~~~
dnautics
can't you run these feature tests in parallel? I'm runnning a system where it
full on downloads and mounts containers, and executes them (in a loopback,
they get hosted by transient webservers on the test host aka my laptop, but
possibly also travis), about 50 of them in parallel with tons of database
calls, and it usually takes around 10-20s to complete.

~~~
Macha
It only takes you 20s to start up 50 containers? I doubt I could start 50
alpine containers in that time on my company issue 2015 MBP (Edit: A quick
test reveals my personal desktop can in fact accomplish this, but it's much
more powerful than my work MBP and has no virtualization overhead since it's
running native docker instead of docker for mac).

I don't know about jeffasinger's company, but in optimal circumstances, a test
instance of our app running against a in memory h2 DB takes 3 minutes to
start. The app internally already performs heavy lifting in parallel, so it's
also not clear that running multiple instances of the app will make it that
much faster...

~~~
dnautics
Singularity containers, my dev laptop is Linux, way less overhead than docker
on mac. I'm simulating request load on a scheduler for compute jobs. In
principle those jobs are not generally local, but it helps to stresstest the
scheduler.

------
peterwaller
Mutation testing is a neat idea I'd not heard of. Wonder how well it works in
practice.

Someone's implemented a package for doing it with Go which looks good:
[https://github.com/zimmski/go-mutesting](https://github.com/zimmski/go-
mutesting)

~~~
jesusmg
Me neither, I liked the concept of mutation testing. (I was using this
performing changes manually, without knowing this technique has a nice name).
I would appreciate if somebody points out a mutation framework /tools for .net

~~~
purity_resigns
[https://github.com/fscheck/FsCheck](https://github.com/fscheck/FsCheck) is
something I've used very briefly in the past. I did more work with this sort
of thing in Scala.

In my experience, you usually end up with much more coverage than you want or
need.

------
sethammons
I hadn't realized this mutation testing existed as automated tooling. I'll be
looking more into it. Traditionally, I've gone with "sabotaging" the code when
writing unit tests; altering the code to verify a test goes from red to green
or vise versa. Never trust a test that has never failed.

~~~
josteink
> Traditionally, I've gone with "sabotaging" the code when writing unit tests;
> altering the code to verify a test goes from red to green or vise versa.
> Never trust a test that has never failed.

That's completely backwards compared to how you should be doing things.

You either write the test:

1\. to verify the existence of a bug by reproducing it, or

2\. to formalize the spec for yet-to-be implemented code/feature.

And then you make the test green. Retroactively writing tests for working
code, only to sabotage the code... Seems like an odd way of doing things.

~~~
paulriddle
Some people feel more comfortable writing code without tests first, gradually
shaping final design of the interface. Especially if the tests involve a lot
of stubbing and mocking of things like Redis and Sphinx Search, or messing
with crypto tokens, parsing HTML, freezing time, setting up global config
attributes like support emails, interaction between 2 different databases,
etc. Tests are code as well and oftentimes might feel heavier than production
code. You can of course say that there is a way things should be no matter
what but that might lead to negative emotions, toxicity and complaining
instead of getting stuff done.

~~~
crispyporkbites
Agree, actually designing & implementing good tests requires a lot of effort.
It's rarely wasted effort but if you really need something out the door now it
can distract from short term delivery.

e.g. I'm using a 3rd party transpiler / build tool, which I don't know all the
details of, my test runner wouldn't do exactly the same transpliation of code
this build tool, it took me about an hour to figure out where the build tool's
config file was and how to get it working with the test runner.

Did I learn more about my tooling and code base? Yes. Was it useful? well
maybe not as I might kill this project shortly anyway.

------
pedro1976
I had a similar experience, you always get what you measure.

IMO the best a approach is to first integrate the test coverage in the code
reviews, cause there is no hard rule [0] and second write property-based tests
[1].

[0] [http://www.se-radio.net/2018/05/se-radio-episode-324-marc-
ho...](http://www.se-radio.net/2018/05/se-radio-episode-324-marc-hoffmann-on-
code-test-coverage-analysis-and-tools/)

[1] Sample framework for JavaScript
[https://github.com/jsverify/jsverify](https://github.com/jsverify/jsverify)

~~~
Vinnl
I prefer just setting coverage requirements to 100%, and give developers the
freedom to mark code as ignored for coverage (e.g. /* istanbul ignore next */
for many Javascript applications). That way, the annotation is something that
can be brought up during code reviews in case the reviewer does not agree with
that not being covered, and it doesn't depend on the reviewer having to
remember to run or look at a separate coverage report.

(I wrote more about this here: [https://vincenttunru.com/100-percent-
coverage/](https://vincenttunru.com/100-percent-coverage/))

------
dankohn1
Does anyone have experience with [https://github.com/stryker-
mutator/stryker](https://github.com/stryker-mutator/stryker) for code mutation
testing?

------
Radle
The author suggests to add mutations to the code, like replacing '<' with '<='
in order to "test the tests". He assumes that tests should fail due to such
small differences.

I think this might apply when you write your own algorithms and want to test
them. But if you are, like most of us, working on Business Logic, then you are
probably writing the wrong tests.

We usually want to know whether a workflow or a customer story works as
intended and as such we should write more Integration tests.

The idea itself has merit though and draws parallels to writing tests with
variable or random input parameters, I am pretty sure there already was
something on this.

~~~
adrianN
You think customers aren't affected if you mistakenly replace a < with a <= in
your code?

~~~
majewsky

      while (car.wheels < 4)
        car.attach(new Wheel);

------
luord
I can't imagine, as a developer, writing tests without assertions. Any
developer who does that is actively making the situation worse for the team
and the application.

Conversely, I've worked in teams whose managers encouraged the developers to
forgo tests for the sake of delivering faster. Never again.

------
blueyed
Some tools for mutation testing in Python:

\- [https://github.com/boxed/mutmut](https://github.com/boxed/mutmut)

\- [https://github.com/sixty-north/cosmic-ray](https://github.com/sixty-
north/cosmic-ray)

------
_pmf_
You can do that, or you can let priority of business requirements drive your
testing efforts. For each project, I have a certain set of hard requirements
that will never be used in the field. We deliver these half-assed and fix bugs
if 2 years later someone uses the feature by accident.

------
simplecomplex
Use a language that doesn’t allow incorrect code and you get 100% code
coverage without writing tests.

Smart compilers and precise language (ala Haskell) obviate the need for
writing unit tests.

------
iamleppert
Who is going to test that the mutators are doing the right kind of mutations?
lol I’ve worked on code that has 1000’s of tests and is still a bug ridden
hell.

~~~
blueyed
Those tools also have tests.. :)

------
snidane
I always achieve 100% relevant code coverage. I just set up a test with one
input which tests my main(), which then obviously calls all the relevant parts
of the code.

Or am I missing something in the definition of code coverage?

I don't see anyone using exhaustive-input tests, which is usually impossible
anyway, or splitting up each branch of conditionals into separate functions so
as to make them unit testable.

I only see people splitting up the code into arbitrary function blocks, having
at least one unit test for such function and then declaring all lines of code
of that function as test-covered.

