
How I Write Tests - henrik_w
https://blog.nelhage.com/2016/12/how-i-test/
======
xivusr
Excellent article and it's perfect timing for me!

In the past while contracting I was usually asked to include in my proposals
estimates for tests.

The tests failed to be useful, simply because they were written _after_ the
feature was actually implemented! We knew better, of course, but this was done
in order to quickly get builds a client could look at.

Then when clients wanted to add/change features guess what got cut to make up
for the time? Thats right, the tests!

So the tests were always secondary, and the projects tended to suffer as a
result.

Recurring "fixed bugs" cost more than just working hours to fix. In the eyes
of a client or customer, they are so much worse than a shiny new bug. Tests
can help catch recurring bugs _before_ a client/customer does - and save you
not only time, but from losing your customers confidence.

Now, I'm building my own app and I'm using a diciplined TDD approach. I didn't
start my project this way as It seemed overkill when it was just me. But I saw
early on that to not practice TDD even solo was madness. It _is_ taking
longer, but my actual progress is consistent, and I'm already far more
confident about the stability of the app.

~~~
MaulingMonkey
> The tests failed to be useful, simply because they were written after the
> feature was actually implemented!

I find after-the-fact asserts, unit tests, and static analysis still useful:

\- I catch incorrectly handled edge cases that I haven't hit "in the wild" yet
(although others may have!) \- I think of new edge cases to handle \- I can be
more aggressive in future refactoring \- I can be more confident in excluding
the code from my bug hunts

Many of the issues I catch this way are the kind of issues that threaten to
turn into really nasty heisenbugs - such as the occasional missing mutex lock.
After all, all the low hanging fruit was probably caught when testing the
feature locally ;)

~~~
lisivka
Parent is talking about functional test cases, i.e. something that can be
tested manually from command line. Unit tests are not a functional or
integrational tests, so yes, unit tests are useful even after functional and
integrational tests are written.

~~~
aisofteng
The concept of a functional test seems to vary quite a bit more from
engineering group to group than your usage indicates.

Sometimes, some unit tests border on functional tests and vice versa, the
distinction blurring between two adjacent test cases in a unit test suite.

And sometimes integration suites end up testing single functionalities at a
time, depending on the design and/or requirements of the application, and
could reasonably be called functional tests.

On my team, I hesitate to emphasize "functional tests" as a standalone group
equivalent in semantic distinction to unit and integration tests; the proper
scope of a "functional" test is defined, to me, by the way the functionalities
of an application were designed and decoupled, which depends on the particular
project.

(If your organization uses functional requirements and specifications for
software design, it's simpler - a functional test verifies a functional
requirement. Unfortunately, having functional requirements as a part of the
engineering process is significantly rarer than it should be.)

------
mberning
Interesting post. I have found myself doing a lot of the same things through
my own experience. One thing I have been striving for lately is to build
functionality using relatively more small methods that accept paramters as
opposed to using state which is stored in instance variables. I find that this
makes my life easier when writing tests and also helps identify corner and
edge cases that I may not have thought about. And when something does break it
is usually very easy to add a failing test, fix the code, and see that
everything is now working. Also makes me a lot more confident when I go to
refactor. Sandi Metz gave a great talk on the gilded rose problem that
explores these concepts.

~~~
Kiro
How do you avoid the parameters from getting bloated? Like having to pass the
same dependencies into every method.

~~~
mberning
It's a judgement call to be sure. If I start seeing a pattern of adding a
bunch of parameters I stop and rethink what I am doing. Usually there is a
structural problem that can be fixed.

------
ridiculous_fish
> My final, and perhaps more important, advice is to always write regression
> tests. Encode every single bug you find as a test, to ensure that you’ll
> notice if you ever encounter it again.

This is a nice ideal but in practice can be really hard. For example, say I
fix a grammar mistake in a text label (or, hard mode, a code comment). One
could write a test, perhaps integrating a grammar checker, but this is a lot
of work, for a low reward. So where do we draw the line?

It's easy to test things that have inputs and output representable as binary
blobs. It's hard to test that this animation is smooth, that the build
succeeds on obscure systems, that this graphic is rendered acceptably, that
this event happens when the user connects that device. Or, rather, it's easy
for a person to manually test any of these things, but hard to write an
automated test for them.

A common failure mode is to capture the input and thereby isolate the system.
If 95% of your bugs are due to interactions with your dependencies, isolating
your system is going to find very little compared to a full integration test.
This is a great way to make the tests pass, and also make the tests useless.

Maybe we conceptualize testing on two axes: automated testability and manual
testability. If you try to write tests for components that have poor automated
testability, you'll hit a valley of pain: spurious failures, excessive mocking
of your dependencies, etc., that you can only climb out of with great effort
(e.g. a robot inserts an HDMI cable). If these components have good manual
testability, then it may be more cost effective to just test it manually.

------
crdoconnor
>My final, and perhaps more important, advice is to always write regression
tests. Encode every single bug you find as a test, to ensure that you’ll
notice if you ever encounter it again.

This is good advice.

On a previous (technical-debt ridden) project I did a little measuring and
there was a pretty clear hierarchy of test value - in terms of detected
regressions:

1) Tests written to invoke bugs.

2) Tests written before implementing the feature which makes them pass.

3) Tests written to cover "surprise" features (i.e. features written by a
previous team that I never noticed existed until they broke or I spotted
evidence of them in the code).

4) Tests written after implementing the feature.

5) Tests written just for the sake of increasing coverage.

Surprisingly 5 actually ended up being counter-productive most of the time -
those tests detected very few bugs but still had a maintenance and runtime
overhead.

~~~
housel
What do people think about writing explicit regression tests for bugs found
through fuzz testing? I've tended to lean towards not writing them, depending
on continuing fuzz testing to insure things stay clean. This may of course be
somewhat naive, but I also fear cluttering my test suite with really obscure
test cases.

~~~
aisofteng
I think the premise of your question confuses considerations, unless I've
misunderstood.

On all projects I own, the policy is that a bug fix will not be merged into a
codebase without comprehensive unit testing demonstrating the case in which
that bug was discovered, and that it has been resolved.

I do not understand why it matters _how_ the bug was discovered. If fuzz
testing discovered that function foo throws tries to dereference a null
pointer given input "ABABAB", then I would expect the engineer who chose to
address that bug to investigate what property of "ABABAB" is the unaccounted
for property, account for it, and then write a unit test calling foo with
input "ABABAB", along with several other inputs that share the same discovered
underlying property.

Fuzz testing may be a different method of testing, but the end result is,
regardless, that you have discovered an input that your application hasn't
been designed to handle properly and that needs to be demonstrably fixed,
whatever it may be in particular.

------
aisofteng
This post is timely for me because it happens to encapsulate the way I've
found a balance recently between test driven development and "test after
writing" development that seems to be very effective for me.

As the article notes, writing all tests first is unreasonable because you
won't know all implementation details until you have to make them; the tests I
write first are thus functional tests, nowadays with cucumber.

Writing tests after coding is lacking, philosophically, because you often
spend your time defining the abstractions and then just rewriting a
verification of that abstraction in tests, plus some null checks.

The balance I've been using has been to write tests for abstractions I come up
with, one my one. If an abstraction is decoupled and encapsulated, the unit
tests come naturally. If i have to write a lot of mocks for an abstraction,
that often tells me it isn't cleanly decoupled or simplified.

Furthermore, as you write tests as you go this way, you often find yourself
writing the same support code more than once, at which point you notice it and
find abstractions in that support code; this ends up explicitly giving you a
view of what conscious and subconscious assumptions you have about what inputs
you are expecting and what assumptions you have made. This is often
enlightening.

------
dvirsky
I don't like to write tests ahead of code, but the idea of "Avoid running
main" is very powerful and something that helps me a lot. Usually in a new
project I try to just use tests as the playground for the evolving code, and
delay actually creating a working application for as long as I can (not so
hard in non UI apps). In an existing project you delay integrating your new
module with the whole app.

Sometimes my tests just start as a bunch of prints to see the results
visually. Then when I'm happy with the results I convert these prints to
assertions and the playground becomes a real test suite.

~~~
aisofteng
It is a nice quote, but frankly I would not want that to be the takeaway.
Remembering "Avoid running main" is just a principle without reasoning or
justification, past remembering it sounded right when first read.

The underlying concept is that unit tests should verify behavior of minimal
units of functional code. If you are running the application, you are testing
much more high level functionality. "Running main" and "running unit tests"
are completely different things, and I would rather the principle behind this
difference be the takeaway, rather than just "avoid running main."

~~~
watt
The advice really is to avoid the quick dopamine hit of running the app and
seeing the new feature work by manual testing.

It's a motivation thing, and I think it can be very valuable tip how to
motivate yourself to keep up your unit test suite (for those of us who need
this behavioral trick).

------
mooreds
Nice to see a non religious post about testing.

Particularly enjoyed the emphasis on regressions. I converted to testing when
working on a relatively complex data transformation. This was replacing an
existing, scary data transformation process that was hard to test (we'd run
new code for a few days and do a lot of manual examination), so I made extra
certain to design the new system so it was testable. Catching regressions in
test, especially for data processing, is just so much better than catching and
repairing them in production.

------
napo
> I fully subscribe to the definition of legacy code as “code without an
> automated test suite.”

> I’ve never really subscribed to any of the test-driven-development
> manifestos or practices that I’ve encountered.

I feel the exact opposite. I've worked in project with a lot of legacy code,
both with BDD or with UTs that we added latter on.

Even with the best intentions the latter always failed: we always ended up
doing a lot of unreadable tests that had no meaning and that we were afraid to
look at. However, when I was working in a team fully committed on BDD, we were
looking at the tests before looking at the code, the tests were in the center
of the developing process, and we were able to write fast, solid, and simple
tests.

Nowadays, I'm more interested on articles that understand that tests can be a
pain too. And tbh I don't really trust articles that aim at a high coverage
without talking about the different challenges that comes with tests.

~~~
aisofteng
This isn't exactly a fair comparison, in my opinion. Legacy code that was
written before unit testing had become a habit tends to have a design that
isn't always easily covered by unit tests; furthermore, when you give a team
of engineers legacy code and ask them to add tests, they have to trace the
source, make an interpretation of what they perceive the design considerations
to have been, and then write tests to that. What you often end up with,
though, is starting by unit testing the easier to understand pieces of code,
checking for robustness to some bad inputs, and then somewhat skimping on unit
tests on the code that most embodies the original authors' design
considerations.

Which, to be fair, is a highly nontrivial task that will realistically never
be completed as well as if the original authors had written unit tests
demonstrating the intent of their design. And the comparison you should have
been making is to that scenario.

------
nickpsecurity
People into testing everything should also remember there's test generation
tools in commercial and FOSS space to reduce work necessary to do this. Here's
two examples for LLVM and Java respectively. Including the KLEE PDF since the
results in the abstract are pretty amazing.

[https://klee.github.io/](https://klee.github.io/)

[https://www.doc.ic.ac.uk/~cristic/papers/klee-
osdi-08.pdf](https://www.doc.ic.ac.uk/~cristic/papers/klee-osdi-08.pdf)

[http://babelfish.arc.nasa.gov/trac/jpf/wiki/projects/jpf-
sym...](http://babelfish.arc.nasa.gov/trac/jpf/wiki/projects/jpf-symbc)

------
biggerfisch
His comment about the "zoo" of data-driven tests made the way my university's
major algorithms class did tests finally make sense. It's not a concept that
particularly easy to search for when you're working from the command of "make
tests with this filename format, 'test-<args>'", nor is it even something that
strikes one as something that might be an actual design pattern (at least for
me).

I do wish the reasoning had been explained to me far earlier as I might have
been able to really recognize the testing as useful and not just another
strange requirement.

~~~
aisofteng
It occurred to me in response to your comment that, pedagogically, it seems so
obvious to tell math students to check their work, yet my universities never
even mentioned unit tests (outside, perhaps, of the sole, non-required
software engineering course, as I believe it was a work coop type of
arrangement).

------
gregorburger
Has anybody experience with testing code that produces graphics (e.g. 3D
engines, etc.)? I saw some articles stating that mocking the API is a good
approach. But how can you test shaders etc. Tests based on image comparisons
seem very cumbersome. We currently rely heavily on our QA which does automated
integration tests based on image comparisons. But there is no immediate
feedback for developers with this approach.

~~~
mbenjaminsmith
I'm working on a project where I'm doing boolean ops on 3D meshes that need a
lot of tests.

I do a mixture of traditional unit tests w/asserts along with visual feedback.
I have a rig set up that will dump final (or intermediate) results of
operations to the screen with several tests being presented at the same time.

If I'm actively developing something and am going to be spending a lot of time
in the debugger I can solo a test. Having the additional visual feedback makes
everything go a lot faster.

For higher level stuff having a number of tests on screen at once gives visual
feedback about regressions, which again speeds things up a lot.

This combined approach is the most useful I've found so far.

------
kisstheblade
Does Linux have a comprehensive test suite (comparable eg. to SQLite)? I'm
wondering because it seems to be quite bug free, and is a large project, and a
kernel seems to be quite suitable for unit testing (compared to your typical
CRUD app for example).

I suspect there's not much formal testing (at least done or required by Linus,
some external projects may be available). So it seems that testing isn't that
necessary for a quality project? On the other hand Linux has a large community
so maybe that substitutes for a comprehensive test suite?

~~~
simula67
There is a separate project for testing Linux : [https://github.com/linux-
test-project/ltp](https://github.com/linux-test-project/ltp)

------
vinceguidry
These days, I treat test code the same way as I treat application code,
refactoring and cleaning up as I go. I've noticed that in most projects,
unless you do this, there's a tendency to copy-paste tests, without any
thought given to DRY.

~~~
bpicolo
Going too crazy on non-dry tests can make tests failures hard to track down,
though. Meta-programming to generate tests has only lead to pain in my
experience.

~~~
vinceguidry
If you're following the red-green-refactor cycle properly, then you'll have
seen test failures on DRYed tests. Most testing frameworks let you customize
the failure message. It's usually a simple matter of adding more information
about which part of it failed.

I won't meta-program for tests, but I will do things like make a list of
classes, or symbols or whatever, to pass to a loop. Just keep it simple.

~~~
chaverma
> I won't meta-program for tests, but I will do things like make a list of
> classes, or symbols or whatever, to pass to a loop. Just keep it simple.

This is a case where metaprogramming for tests came in handy.

I had a bug recently that involved someone making a change that violated an
invariant property of a class. To codify this invariant, I was tempted to do
what you did, to make a list of symbols to feed into my test to ensure the
invariant was obeyed. However this bug was caused precisely by someone adding
a new symbol, a method, that didn't obey this property. The test using this
design wouldn't catch this failure. I instead opted to do some introspection
(it's Python, so it was dead simple) on the class to ensure all of its methods
obeyed this invariant. It took a little extra time to implement but in the end
it worked.

~~~
aisofteng
With the caveat that I haven't invested the full amount of time necessary to
strongly state this opinion publicly, I do feel that metaprogrammed tests are
a case of adding abstraction unnecessarily (which, incidentally, is why I
haven't taken the time to try to extensively use them on a project.)

A test should be dead simple to read and understand when someone else new to
the project needs to understand what it tests. Further, when a test fails, the
output should clearly indicate exactly on what line of code an error occurred.

Metaprogramming tests feels to me like a case of a desire for or predilection
for cleverness getting in the way of what the task is actually for.

Tests should tell a story. They should not be subjected to the same methods of
abstraction used in the code they themselves are supposed to test and verify.

------
KennyCason
The writing a module and it's tests together, and doing them both at the same
time is some of my #1 advice. If you're having to run main while developing, I
consider something to be a bit odd.

I also find this a much more favorable approach than pure TDD. In my opinion,
This method is easier to "sell" to other developers.

~~~
aisofteng
My opinion is that TDD can only reasonably be done if you already know what
your implementation must be, which is difficult to do unless

* you have already done it, or something sufficiently similar, before; * you have formal functional requirements; or * you have detailed use cases.

Personally, I've recently come to really favor use cases as part of the design
process. The mindset behind them requires some effort to learn, but I find
them an effective vehicle for categorically separating what an application
must do from any consideration for how it might do it.

------
z3t4
I find that bugs occur when you do not fully understand all possible state
combinations and edge cases. So if that is the case I try to break it down to
smaller units that are easier to comprehend. There will still be bugs though,
but they are usually edge cases you didn't imagine would happen, and that's
where I find testing useful, as the next person who touch the code probably
will also miss that edge case.

1) Make changes 2) Manually test & run automatic tests 3) Write automatic
tests for each problem/bug discovered 4) Repeat

This only works for decoupled code though. If all units are coupled you must
have automatic tests of everything as no-one can comprehend exponential
complexity.

~~~
aisofteng
What you are describing is the process of identifying what the "units" are
that should be covered by unit tests.

------
petters
I am certainly no religious follower of TDD, but I _do_ think writing tests
before code is useful.

The reason is simple: it tests your tests. I have many times found bugs in
tests that made them always pass.

~~~
ucho
There is an even better way to test your tests - the mutation testing, for
example [http://pitest.org/](http://pitest.org/). It also helps to find dead
code, that doesn't have any impact on result any more.

------
amelius
One important concept in testing is "code coverage". The technique is to
(conceptually) place a unique print statement in every branch of every "IF"
statement or loop (every basic block), and then try to write tests until
you've triggered all of the print statements.

EDIT: This explains the concept, and gives a _minimal_ approach to testing
(i.e., you should test more than this, but at least this). Of course, there
are tools to automate this, but not for every (new) language.

~~~
ec109685
That doesn't fully test your code. By testing "if a", "if b" separately, you
could miss a bug that occurs when a b are both true.

~~~
amelius
> That doesn't fully test your code.

I didn't say that. Be wary of people who say they fully tested your code :)

Anyway, there's another (non-waterproof) approach: try to trigger all possible
paths through the code (instead of all basic blocks), but the problem is that
the number of paths can increase exponentially with code size.

The code-coverage approach, in contrast, is very cost-effecitve. For example,
roughly speaking, it triggers all possible exceptions that your code can
throw.

~~~
emodendroket
> The code-coverage approach, in contrast, is very cost-effecitve. For
> example, roughly speaking, it triggers all possible exceptions that your
> code can throw.

Well, unless you count uncaught exceptions from things your code calls.

~~~
aisofteng
Or unless you simply forgot to check all possible invalid inputs - which, all
of us being human, will happen periodically, even with code review.

------
tehwalrus
I just took a peice of code with quite good test coverage, and stopped running
main a couple of times during the "unit" test run. Coverage plumetted, and I
realised how much of the code is still untested.

(The code was actually already structured for testing, I just hadn't written
them because of that coverage number....)

I am still running main, by the way, but that's a different invocation called
"system tests" which runs if unit tests pass (and after the coverage report).

~~~
aisofteng
This is a prime example of why not to covet code coverage figures. They can
never be taken at face value as an indicator of testing quality or
comprehensiveness.

Incidentally, OP, I empathize - I had to learn it the hard way too.

------
aisofteng
Having responded to several comments here, I am concerned about the fact that
most of the discourse here seems to fail to completely understand what the
goals are of unit testing - and, worse, many comments, despite this omission,
seem to be made with an air of confidence which I could see myself, when I was
a junior developer, accepting as reliable, because of that tone. As of this
writing, I feel that anyone new to unit testing that comes across this overall
discussion will be sent down the wrong path and may not realize it for a very
long time, and so I feel that it is important to outline what I feel are the
most serious misconceptions about unit testing I see here.

* Code coverage's value: code coverage is not a goal in and of itself. Seeing 100% code coverage should not make you feel comfortable, as a statistic, that there is adequate testing. If you have 100% coverage of branching, you might have indeed verified that the written code functions as intended in response to at least some possible inputs, but you have not verified that all necessary tests have been written - indeed, you cannot know this from this simple metric. To give a concrete example: if I write one test that tests only a good input to a single function in which I have forgotten a necessary null check, I will have 100% code coverage of that function, but I will not have 100% behavioral coverage - which brings me to the following point.

* What to think about when unit testing a function, or how to conceptualize the purpose of a unit test: unit tests should test behavior of code, so simply writing a unit test that calls a function with good input and verifies that no error is not in the correct spirit of testing. Several unit tests should call the same function, each with various cases of good and bad input - null pointer, empty list, list of bogus values, list of good values, and so on. Some sets of similar inputs reasonably can be grouped into one bigger unit test, given that their assert statements are each on their own line so as to be easily identifiable from error output, but there should nevertheless be a set of unit tests that cover all possible inputs and desired behaviors.

* Unit test scope: A commenter I responded to in another thread had given criticism along the lines of that by making two unit tests which test cases A and B entirely independent, you fail to test the case "A and B". This is a misunderstanding of what the scope of a unit test should be in order to be a good unit test - which, incidentally goes along with misunderstanding the intent of a unit test. A unit test, conceptually, should check that the behavior of one piece of functional code under one specific condition is as intended or expected. The scope of a unit test should be the smallest scope a test can without being trivial; we write unit tests this way so that a code change later that introduces a bug will hopefully not only be caught, but be caught with the most specificity possible - test failures should the engineer a story along the lines of "_this_ code path behaved incorrectly when called with _this_ input, and the error occurs on _this_ line". More complex behavior, of the sort of "if A and B", is an integration test; integration tests are the tool that has been developed to verify more complex behavior. If you find yourself writing a unit test that is testing the interaction of multiple variables, you should pause to consider whether you should not move the code you are writing into an integration test, and write two new, smaller unit tests, each of which verifies behavior of each input independent of another.

* Applying DRY to test setup: if you abstract away test setups, you are working against the express intention of each unit test being able to catch one specific failure case, independently of other tests. Furthermore, you are introducing the possibility of systematic errors in your application in the _very possible_ case of inserting an error in the abstractions you have identified in your test setup! Furthermore, f you find yourself setting up the same test data in many places, that should not suggest to you to abstract away the test setup - to you, it should rather hint at what is likely a poor separation of concerns and/or insufficient decoupling in your software's design. If you are duplicating test code, check whether you have failed to apply the DRY principle in your application's code - don't try to apply it to the test code.

And, in my opinion, the most important and common misconception I see here,
and I really feel that it should be more widely understood - and, in fact,
that many problems with legacy code will likely largely stop occurring if this
mindset becomes widespread:

* Why do we write unit tests?

We write unit tests to verify the behavior of written code with respect to
various inputs, yes. But that is only the mechanics of writing unit tests, and
I fear that that is what most people think is the sole function of unit tests;
behind the mechanics of a method there should be a philosophy, and there is.

Unit tests actually serve a potentially (subjectively, I would say "perhaps
almost always") far more vital purpose, in the long term: when an engineer
writes unit tests to verify behavior of the code he has written, he is, in
fact, writing down an explicit demonstration what he intended the program to
_do_; that is, he is, in a way, leaving a record of the design goals and
considerations of the software.

(Slight aside: in my opinion, being a good software engineer does _not_ mean
you write a clever solution to a problem and move on forever; rather, it means
that you decompose the problem into its simplest useful components and then
use those components to implement a solution to the problem at hand whose
structure is clear by design and is easy for others to read and understand. It
further means (or should mean) that you then implement not only verification
of the functionality you had in mind and its robustness to invalid inputs
which you cannot guarantee will never arrive, but also implement in such a way
that it indicates what your design considerations were but serves as a guard
against a change that unknowingly contradicts these considerations as a result
of a change made by someone else (or yourself!) at a later time.

Later, when the code must be revisited, altered, or fixed, such unit tests, if
well-written, immediately communicate what the intended behavior of the code
is, in a way that cannot be as clearly (or even necessarily, almost definitely
not immediately) inferred from reading the source code.

In summary, these are the main points that stuck out to me in the
conversations here; I do want to emphasize that the last point above is, in my
opinion, the most glaring omission here, because it is an overall mindset
rather than a particular consideration.

------
KuhlMensch
I like this article, however, I would emphasise that there is a balance that
must be struck between writing mini-DSL/fixture generating code v.s. writing
simple data structures (e.g. Object literals that mimic JSON).

Its a good thing to take extra care writing the generating code, as any
brittleness is passed onto dependent tests.

------
LeanderK
i think that languages that have a very good REPL make it easier to write
tests. Because you play around in the Repl, you automatically write a few
tests that you just have to copy. Also, if you design your software repl-
friendly, there is a lower overhead for your tests (easier set-up etc.)

------
billsix
I avoid running main by testing at compile-time :-).

[https://github.com/billsix/bug/blob/master/demo/src/bug-
demo...](https://github.com/billsix/bug/blob/master/demo/src/bug-demo.bug.scm)

------
blaix
This is great. It's nice to see an article from someone who doesn't "do" TDD,
but also isn't ranting about how tests are useless. I personally use and
prefer (test-first) TDD but still agree with all of the advice in this
article.

