Hacker News new | past | comments | ask | show | jobs | submit login
How I Write Tests (nelhage.com)
349 points by henrik_w on Jan 1, 2017 | hide | past | web | favorite | 106 comments

Excellent article and it's perfect timing for me!

In the past while contracting I was usually asked to include in my proposals estimates for tests.

The tests failed to be useful, simply because they were written after the feature was actually implemented! We knew better, of course, but this was done in order to quickly get builds a client could look at.

Then when clients wanted to add/change features guess what got cut to make up for the time? Thats right, the tests!

So the tests were always secondary, and the projects tended to suffer as a result.

Recurring "fixed bugs" cost more than just working hours to fix. In the eyes of a client or customer, they are so much worse than a shiny new bug. Tests can help catch recurring bugs before a client/customer does - and save you not only time, but from losing your customers confidence.

Now, I'm building my own app and I'm using a diciplined TDD approach. I didn't start my project this way as It seemed overkill when it was just me. But I saw early on that to not practice TDD even solo was madness. It is taking longer, but my actual progress is consistent, and I'm already far more confident about the stability of the app.

Tests written after the feature is implemented can be very useful to keep an eye on regression in the case of teams where multiple developers are checking in changes throughout the day and there is a continuous integration build and test system present.

At two companies and across three teams where I've worked as a developer responsible for automation I have found that for integration and system type tests writing tests after feature implementation and properly reporting results following high frequency checkins has been allowed me to have a significant impact on the way we work to the point where some of the leads and management are actually taking testing more seriously than they used to. It's still hard to get rid of the idea that tests are secondary but they're not so eager to cut them as a matter of saving time as before after seeing how they allow us to identify problems faster, and therefore fix them faster.

In this article the author looks he's talking from the perspective of mostly a single developer or contractor but for bigger teams tests written after feature implementation are still important so I would say based on my experience don't discount them. That said, his approach to writing tests is something I agree with, rather than rigidly sticking to a particular methodology he is pragmatic and it's better than doing nothing.

Also he doesn't mention this but solid reporting is so important. I am involved with Tesults (https://www.tesults.com) which is a test results reporting application designed for teams of 10 or more to integrate into their build/test system. If you work for a team of this size and are serious about automated testing check it out. Send me an email if you're thinking of introducing it to your team and have any questions or feedback.

I've mentioned this in a couple other comments here, but it bears repeating here because this comment comes so close to what I feel is truly the most important principle, but falls just short.

Yes, you can construct a regression test suite out of tests written after implementation, and this test suite will hopefully catch incorrect or incomplete alterations made to the software later.

However, the _reason_ it can do that is because these tests will have been written, ideally consciously _because_ of this underlying reason, with the original design of the software in mind; they will be essentially a written record of what the software was _intended_ to do and was designed to do. If one of these tests fails, it means that a design element has been incorrectly or incompletely altered; a well-written unit test of this sort will have an easy to follow flow which will, upon failure, clearly communicate to the programmer which original design element he has not correctly altered and under which specific conditions the error actually occurs.

Writing good tests after implementation can indeed encode a hugely significant amount of information. (Of course, it should also test intended behavior with respect to a wide range of both good or bad inputs, but that is beside the point I want to drive home here.)

Well enough written unit tests can be used as documentation of an application's design!

[N.B. The above thesis really only holds if the authors of the original code write unit tests, ideally as they develop while any design considerations are fresh in memory. As discussed in another comment, sending other engineers back later to try to figure out what the design was and then unit test their interpretation of they think it must have been will almost definitely not give anywhere near the same quality.]

>> Well enough written unit tests can be used as documentation of an application's design!

That sounds really cool. Is there an open source project that you respect that has achieved this, that you can point me to? I'd like to look at it.

Have a look at the unit tests for the components and services in https://github.com/smaato/ui-framework. We designed the test suites to explore and explain how the interface of each module works. So reading the tests is one way to get a feel for how each module is designed to be used.

While a test suite gives you a collection of examples of use, it doesn't tell you the intended purpose and motivation for the software, why it is designed the way it is, provide any argument for correctness or best-, typical- and worst-case performance, or discuss what trade-offs were made, and why. It is often hard to divine the underlying rules from a set of examples, and they don't go out of their way to point out which are the corner cases. There is a great deal of information in the design of any non-trivial piece of software that is not efficiently expounded by tests.

> The tests failed to be useful, simply because they were written after the feature was actually implemented!

I find after-the-fact asserts, unit tests, and static analysis still useful:

- I catch incorrectly handled edge cases that I haven't hit "in the wild" yet (although others may have!) - I think of new edge cases to handle - I can be more aggressive in future refactoring - I can be more confident in excluding the code from my bug hunts

Many of the issues I catch this way are the kind of issues that threaten to turn into really nasty heisenbugs - such as the occasional missing mutex lock. After all, all the low hanging fruit was probably caught when testing the feature locally ;)

The other thing is I tend to be kind of exploratory in style and if I have written a bunch of tests I'm more hesitant to change something, even if it isn't really working.

I would probably be willing to say that after the fact asserts useful during development should be considered as candidates to be turned into unit test cases, one to one. Few people use asserts, but I don't see why this methodology couldn't be rather effective, given good practices with asserts.

Parent is talking about functional test cases, i.e. something that can be tested manually from command line. Unit tests are not a functional or integrational tests, so yes, unit tests are useful even after functional and integrational tests are written.

The concept of a functional test seems to vary quite a bit more from engineering group to group than your usage indicates.

Sometimes, some unit tests border on functional tests and vice versa, the distinction blurring between two adjacent test cases in a unit test suite.

And sometimes integration suites end up testing single functionalities at a time, depending on the design and/or requirements of the application, and could reasonably be called functional tests.

On my team, I hesitate to emphasize "functional tests" as a standalone group equivalent in semantic distinction to unit and integration tests; the proper scope of a "functional" test is defined, to me, by the way the functionalities of an application were designed and decoupled, which depends on the particular project.

(If your organization uses functional requirements and specifications for software design, it's simpler - a functional test verifies a functional requirement. Unfortunately, having functional requirements as a part of the engineering process is significantly rarer than it should be.)

It shouldn't really matter whether the units tests are written before or after. What matters is how testable the source code is, and making sure the units tests add value.

Of course, writing the tests before the actual source code of the feature enforces the fact that it's testable, but it's mainly a side effect, there's nothing inherently better.

The microdesign flaws that stem in TDD (described in [1]) were real in my personal case, and my code is way better on the long term with the units test written afterwards, because I tend to modify my design a lot while writing code.

[1] http://beust.com/weblog/2014/05/11/the-pitfalls-of-test-driv...

Tests only fail to be useful because they were not written with an explicit purpose in mind; good tests are chosen to explicitly verify behavior that is worth verifying.

If you have a good design in mind before starting development, which is, after all, an ideal situation, it could be argued that it is reasonable to write tests after most of the development, not only to verify robustness to bad input as per the usual, but also as a way to encode a demonstration of the intent of the design.

I agree that this is an excellent article. I also agree that the situation you spoke to happens all too often. I recently worked on the backend API for a mobile app. Because the company wanted it done in a month, they said "no tests!" Now that they've released, they want me to go back and crowbar a bunch of tests into place. I can guarantee the next push they have for a release will be under "no test" conditions again though.

Regarding solo TDD, it is slow going at first, but you quickly make up that time later on, when you add code, break your tests, and can quickly jump in and fix the issue. Saves your reputation too, I think.

Interesting post. I have found myself doing a lot of the same things through my own experience. One thing I have been striving for lately is to build functionality using relatively more small methods that accept paramters as opposed to using state which is stored in instance variables. I find that this makes my life easier when writing tests and also helps identify corner and edge cases that I may not have thought about. And when something does break it is usually very easy to add a failing test, fix the code, and see that everything is now working. Also makes me a lot more confident when I go to refactor. Sandi Metz gave a great talk on the gilded rose problem that explores these concepts.

It's called Inversion of Control (IoC)[1]. It's arises naturally when code is used in more than one place.

[1]: https://en.wikipedia.org/wiki/Inversion_of_control

How do you avoid the parameters from getting bloated? Like having to pass the same dependencies into every method.

It's a judgement call to be sure. If I start seeing a pattern of adding a bunch of parameters I stop and rethink what I am doing. Usually there is a structural problem that can be fixed.

Maybe by using currying/partial application?

I find myself doing the same.

> My final, and perhaps more important, advice is to always write regression tests. Encode every single bug you find as a test, to ensure that you’ll notice if you ever encounter it again.

This is a nice ideal but in practice can be really hard. For example, say I fix a grammar mistake in a text label (or, hard mode, a code comment). One could write a test, perhaps integrating a grammar checker, but this is a lot of work, for a low reward. So where do we draw the line?

It's easy to test things that have inputs and output representable as binary blobs. It's hard to test that this animation is smooth, that the build succeeds on obscure systems, that this graphic is rendered acceptably, that this event happens when the user connects that device. Or, rather, it's easy for a person to manually test any of these things, but hard to write an automated test for them.

A common failure mode is to capture the input and thereby isolate the system. If 95% of your bugs are due to interactions with your dependencies, isolating your system is going to find very little compared to a full integration test. This is a great way to make the tests pass, and also make the tests useless.

Maybe we conceptualize testing on two axes: automated testability and manual testability. If you try to write tests for components that have poor automated testability, you'll hit a valley of pain: spurious failures, excessive mocking of your dependencies, etc., that you can only climb out of with great effort (e.g. a robot inserts an HDMI cable). If these components have good manual testability, then it may be more cost effective to just test it manually.

This post is timely for me because it happens to encapsulate the way I've found a balance recently between test driven development and "test after writing" development that seems to be very effective for me.

As the article notes, writing all tests first is unreasonable because you won't know all implementation details until you have to make them; the tests I write first are thus functional tests, nowadays with cucumber.

Writing tests after coding is lacking, philosophically, because you often spend your time defining the abstractions and then just rewriting a verification of that abstraction in tests, plus some null checks.

The balance I've been using has been to write tests for abstractions I come up with, one my one. If an abstraction is decoupled and encapsulated, the unit tests come naturally. If i have to write a lot of mocks for an abstraction, that often tells me it isn't cleanly decoupled or simplified.

Furthermore, as you write tests as you go this way, you often find yourself writing the same support code more than once, at which point you notice it and find abstractions in that support code; this ends up explicitly giving you a view of what conscious and subconscious assumptions you have about what inputs you are expecting and what assumptions you have made. This is often enlightening.

>My final, and perhaps more important, advice is to always write regression tests. Encode every single bug you find as a test, to ensure that you’ll notice if you ever encounter it again.

This is good advice.

On a previous (technical-debt ridden) project I did a little measuring and there was a pretty clear hierarchy of test value - in terms of detected regressions:

1) Tests written to invoke bugs.

2) Tests written before implementing the feature which makes them pass.

3) Tests written to cover "surprise" features (i.e. features written by a previous team that I never noticed existed until they broke or I spotted evidence of them in the code).

4) Tests written after implementing the feature.

5) Tests written just for the sake of increasing coverage.

Surprisingly 5 actually ended up being counter-productive most of the time - those tests detected very few bugs but still had a maintenance and runtime overhead.

What do people think about writing explicit regression tests for bugs found through fuzz testing? I've tended to lean towards not writing them, depending on continuing fuzz testing to insure things stay clean. This may of course be somewhat naive, but I also fear cluttering my test suite with really obscure test cases.

I think the premise of your question confuses considerations, unless I've misunderstood.

On all projects I own, the policy is that a bug fix will not be merged into a codebase without comprehensive unit testing demonstrating the case in which that bug was discovered, and that it has been resolved.

I do not understand why it matters _how_ the bug was discovered. If fuzz testing discovered that function foo throws tries to dereference a null pointer given input "ABABAB", then I would expect the engineer who chose to address that bug to investigate what property of "ABABAB" is the unaccounted for property, account for it, and then write a unit test calling foo with input "ABABAB", along with several other inputs that share the same discovered underlying property.

Fuzz testing may be a different method of testing, but the end result is, regardless, that you have discovered an input that your application hasn't been designed to handle properly and that needs to be demonstrably fixed, whatever it may be in particular.

Wouldn't you want to write those explicit tests anyway, to run the troublesome input in isolation while fixing the bug? With the tooling to fuzz, they should be one-liners or close, hardly cluttering. One time, working on an extremely fuzz-friendly function that was crazy rich in corner cases I even made the error message of the fuzz loop include the one-liners that would execute the failing inputs, ready for copy&paste. Testing never felt more productive.

I actually don't think that heavy fuzzing has a place in an automated test suite at all. Test suites should be fast and 100% reproducible at all times. Then explicit regression tests for the discovered cases are the only way. (I do occasionally allow myself to include short fuzz loops with fixed RNG initialization, but those are more on the "shameful secrets" end of the spectrum)

Haven't worked with fuzz testing myself, but it sounds like something I'd lean towards writing. Those obscure test cases are exactly the thing you don't tend to find in manual testing, and AFAIK fuzz testing is random enough that you can't be sure that every run will exercise the same bug.

One benefit I find with code coverage as a goal is that it highlights unused or unnecessary branches, and can encourage me to think about how my code is organized such that code is more easily accessible.

Put another way, those branches that are rarely touched or hard to get to can become a surprise when they are actually reached in some unique situation.

I guess in this case it isn't really the test itself that is useful but instead the requirement to at least hit each branch can cause me to design and organize the code better.

Code coverage reports are a visualization tool that should be used to quickly get a lay of the land. Code coverage itself should not be the goal; the goal should be careful coverage of behavior that is composed of discrete units amenable to unit tests.

I once made the mistake, as a new lead, of implementing a 100% code coverage policy; I thought that I was expressing the intent of covering all possible behavior. What ended up happening is that the team focused on the metric and lost sight of the goal of unit testing, which is to test behavior. We ended up with people submitting PRs containing unit tests for object getters and setters but not testing that trying to set a null value is properly handled.

That experience taught me is that code coverage is only a tool, and a tool is only useful if used correctly.

To drive the point home: having 100% coverage of all branches means that all written code is tested, but it does not mean that all code that needs to be written has been written. Unit tests should verify behavior, not just execute whatever code has been written.

Yeah I agree, once a metric itself becomes the goal then you end up with some unmaintainable tests just to hit a block. I guess I would prefer to see un-hit blocks eliminated, not obtuse tests written to try and hit them. But like you said, I can see how if test coverage is the only end goal, you could end up with some pretty useless tests just to hit some random block.

I don't like to write tests ahead of code, but the idea of "Avoid running main" is very powerful and something that helps me a lot. Usually in a new project I try to just use tests as the playground for the evolving code, and delay actually creating a working application for as long as I can (not so hard in non UI apps). In an existing project you delay integrating your new module with the whole app.

Sometimes my tests just start as a bunch of prints to see the results visually. Then when I'm happy with the results I convert these prints to assertions and the playground becomes a real test suite.

It is a nice quote, but frankly I would not want that to be the takeaway. Remembering "Avoid running main" is just a principle without reasoning or justification, past remembering it sounded right when first read.

The underlying concept is that unit tests should verify behavior of minimal units of functional code. If you are running the application, you are testing much more high level functionality. "Running main" and "running unit tests" are completely different things, and I would rather the principle behind this difference be the takeaway, rather than just "avoid running main."

The advice really is to avoid the quick dopamine hit of running the app and seeing the new feature work by manual testing.

It's a motivation thing, and I think it can be very valuable tip how to motivate yourself to keep up your unit test suite (for those of us who need this behavioral trick).

yeah but we are human and flawed, at least I am. this is just a practice that forces me to properly test and be less lazy, because I can't test it manually and move on.

I don't like "avoid running main", I feel like it would undermine "holistic" engineering, since you're missing out on using the feature in the context of the application.

> However, it’s also largely-wasted effort! A manual test only verifies the current state of the code base.

I don't think this is true, at least for GUIs, because manually using the UI validates that it works for a human. An automated GUI test does really only cover a tiny, tiny subset of the interaction between the human and the machine (yes, ideally "someone else would do it", but how many companies did you work for that actually has UX reviews and dedicated UX testers? Not to speak of open source).

Also and especially for larger applications I strive to get to the dogfooding point as fast as possible, because dogfooding provides invaluable feedback, not only for GUI and UX, but also for the sort of issues that tend to get overlooked in tests and reviews, eg. leaks in long-running processes, or structures that don't scale well. Dogfooding also tends to mean that the overall structure becomes relatively stable quickly, which makes writing a comprehensive test suite easier at that point, since you don't have to constantly work on both tests and implementation, but can concentrate on either.

Writing this reply made me think that I'm probably even less of a TDD person than I thought before.

It really depends on which application. In the past few years I've been mainly writing server-side infrastructure libraries and services with no GUI at all. In those cases I'm writing stuff bottom-up anyway, so while I'm writing the bottom layers (data structures, interfaces, algorithms), I'll be testing them and playing with them with no big app wrapping them. It's both faster to write the bottom layers like that, and eventually you end up with pretty solid tests. So it's not a compromise for me in these cases.

It depends hugely on the size of your application, and of your team. For small applications and and a team size of up to three devs, your approach might be preferable. But I can't imagine that it would scale well beyond that. You will be able to test that your newly introduced feature works as expected, but you will almost certainly not be able to verify that you didn't break anything meanwhile. In general, you write tests mainly not to ensure that what you just developed works, but to ensure that it keeps working while the application evolves.

A good way to encourage modular testing over manually clicking through whatever feature you just added, is to make the "main" startup painfully slow or include manual steps other than just hitting "run" in an IDE.

If the effort of writing a few tests is smaller than actually running the application, the tests will be written because developers are lazy (instead of the other way around).

In the good old days of massive JBoss EE applications, this was taken care of automatically by the app server :)

Nice to see a non religious post about testing.

Particularly enjoyed the emphasis on regressions. I converted to testing when working on a relatively complex data transformation. This was replacing an existing, scary data transformation process that was hard to test (we'd run new code for a few days and do a lot of manual examination), so I made extra certain to design the new system so it was testable. Catching regressions in test, especially for data processing, is just so much better than catching and repairing them in production.

> I fully subscribe to the definition of legacy code as “code without an automated test suite.”

> I’ve never really subscribed to any of the test-driven-development manifestos or practices that I’ve encountered.

I feel the exact opposite. I've worked in project with a lot of legacy code, both with BDD or with UTs that we added latter on.

Even with the best intentions the latter always failed: we always ended up doing a lot of unreadable tests that had no meaning and that we were afraid to look at. However, when I was working in a team fully committed on BDD, we were looking at the tests before looking at the code, the tests were in the center of the developing process, and we were able to write fast, solid, and simple tests.

Nowadays, I'm more interested on articles that understand that tests can be a pain too. And tbh I don't really trust articles that aim at a high coverage without talking about the different challenges that comes with tests.

This isn't exactly a fair comparison, in my opinion. Legacy code that was written before unit testing had become a habit tends to have a design that isn't always easily covered by unit tests; furthermore, when you give a team of engineers legacy code and ask them to add tests, they have to trace the source, make an interpretation of what they perceive the design considerations to have been, and then write tests to that. What you often end up with, though, is starting by unit testing the easier to understand pieces of code, checking for robustness to some bad inputs, and then somewhat skimping on unit tests on the code that most embodies the original authors' design considerations.

Which, to be fair, is a highly nontrivial task that will realistically never be completed as well as if the original authors had written unit tests demonstrating the intent of their design. And the comparison you should have been making is to that scenario.

People into testing everything should also remember there's test generation tools in commercial and FOSS space to reduce work necessary to do this. Here's two examples for LLVM and Java respectively. Including the KLEE PDF since the results in the abstract are pretty amazing.




His comment about the "zoo" of data-driven tests made the way my university's major algorithms class did tests finally make sense. It's not a concept that particularly easy to search for when you're working from the command of "make tests with this filename format, 'test-<args>'", nor is it even something that strikes one as something that might be an actual design pattern (at least for me).

I do wish the reasoning had been explained to me far earlier as I might have been able to really recognize the testing as useful and not just another strange requirement.

It occurred to me in response to your comment that, pedagogically, it seems so obvious to tell math students to check their work, yet my universities never even mentioned unit tests (outside, perhaps, of the sole, non-required software engineering course, as I believe it was a work coop type of arrangement).

I actually wrote an article about this strategy: https://medium.com/@tinganho/baseline-acceptance-driven-deve...

Has anybody experience with testing code that produces graphics (e.g. 3D engines, etc.)? I saw some articles stating that mocking the API is a good approach. But how can you test shaders etc. Tests based on image comparisons seem very cumbersome. We currently rely heavily on our QA which does automated integration tests based on image comparisons. But there is no immediate feedback for developers with this approach.

I'm working on a project where I'm doing boolean ops on 3D meshes that need a lot of tests.

I do a mixture of traditional unit tests w/asserts along with visual feedback. I have a rig set up that will dump final (or intermediate) results of operations to the screen with several tests being presented at the same time.

If I'm actively developing something and am going to be spending a lot of time in the debugger I can solo a test. Having the additional visual feedback makes everything go a lot faster.

For higher level stuff having a number of tests on screen at once gives visual feedback about regressions, which again speeds things up a lot.

This combined approach is the most useful I've found so far.

I have problems with this as well. Sometimes there are obvious things that can be tested on the resulting image. For example when applying a blur on a grey image with one white pixel in the middle, I know how I expect the sum of the pixel values to change, and I know the upper and lower bounds of valid pixel values.

After the fact unit tests that verify that abs(expectedImage - resultingImage) < e are also helpful. But unless you write some modeling application that has to produce exact results, a lot of code is typically about what looks good, not about what's mathematically correct. Those parts are probably impossible to properly test automatically.

Speaking in generalities and of an ideal world:

Whatever computations you are making should be composed in such a way that distinct, reused computation types are implemented in units, perhaps conceptually one type of transformation per function. Each of these functions should be tested by a collection of unit tests which verify that it performs the correct computation on hard-coded, mock data, both good and bad, and that it fails when it should.

In cases where multiple transformations are applied together and there is more complex behavior, the application should be designed so that those combinations of "unit" behaviors are done for a specific purpose, with a specific goal (e.g. warp this image and then project its shadow); you should have an integration test suite that verifies this more complex behavior performs the function it is supposed to. Here, you can probably be less strict about checking inputs since bad inputs should be covered by unit tests, but you should verify that there is proper error handling of multiple units failing.

These should all run using mock data as part of your CI process, and no code should ever be merged that has failing checks; new code should not be merged without code review verifying that testing of comprehensive.

Finally, there should be a QA process that verifies the final product when being considered for release, using real data and manual validation, ideally with acceptance criteria in hand that was written before the code changes were begun.

In the end, it shouldn't even really come down to trying to mock your APIs; your abstractions should, if at all possible, be decoupled enough that they perform transformations of data that they are given, regardless of whether it came from another API or not - so instead of considering "mocking the APIs" you call the relevant functions with example data of the sort you receive from the APIs that will be actually used.

Does Linux have a comprehensive test suite (comparable eg. to SQLite)? I'm wondering because it seems to be quite bug free, and is a large project, and a kernel seems to be quite suitable for unit testing (compared to your typical CRUD app for example).

I suspect there's not much formal testing (at least done or required by Linus, some external projects may be available). So it seems that testing isn't that necessary for a quality project? On the other hand Linux has a large community so maybe that substitutes for a comprehensive test suite?

There is a separate project for testing Linux : https://github.com/linux-test-project/ltp

These days, I treat test code the same way as I treat application code, refactoring and cleaning up as I go. I've noticed that in most projects, unless you do this, there's a tendency to copy-paste tests, without any thought given to DRY.

Copy-paste-change tests may be interesting, though. Too much fancyness makes tests difficult to follow in my experience.

Ideally, tests should take three steps:

  - setup
  - perform a single operation (the test)
  - check operation result is the expected one
If each step is simple and readable, is actually easy to follow, even if a lot of cases uses almost the same initial setup and expected result, making copy-paste-change a good way of adding tests.

If a change in functionality if big enough to require change a lot of tests, this way makes obvious is changing the behaviour in lots of cases, something good to keep in mind. Most times a simple editor replace suffices, but shows up how big the change is.

Refactoring should be focused in making those steps easy to write and read...

Yeah, I've done the same under something I've learned as AAA; Arrange, Act and Assert, which I think is the more familiar naming of a system like this. C2 has some more information about it over here: http://wiki.c2.com/?ArrangeActAssert

I often find it interesting the propensity to name things. In my opinion, this approach should be self-evident once the concept of unit tests is correctly understood - prepare the data, run the code, check the output.

I suppose this is another case where a common approach is given a name so that people can refer to it to each other and pass it onto others. I wouldn't have thought to name this one, though - I would rather teach a junior engineer how to think about what a test should do, instead of telling him to do "AAA".

With experience, I have seen the effect of something being given a name eventually leading to less experienced people treating it as a universal principle and wielding it as if all they have is a hammer, perhaps before having formed the habits of checking assumptions before making decisions, etc.

I'd argue that having setup in your tests is an anti-pattern. The setup should be, at most, a single message send to the object under test. Simply call it in your expectation. If you're putting significant code in your test, consider moving that logic to the app.

Quite a bit of the time, you don't want to rely on application logic to do your setup for you. You end up needing the application logic to put things into the DB, without any bugs, in order to be sure your test is actually testing the part of the code it's meant to be testing.

Allowing your application logic to do this for you creates all sorts of opportunity for hard to debug issues that have nothing to do with the thing you were actually testing. Putting data into the test DB directly makes for more setup, but makes for much more isolated test cases and much more certainty that your tests are doing what you think they're doing.

It also means you can write tests as you go without having to build a whole level of application logic to support your single test.

I think a problem with this is that when you bring an actual database into the test, it ceases to be a unit test and becomes an integration test (not that integration tests aren't useful). I would prefer for the application to be separated enough from the database that I can test the business logic entirely on its own.

I would suggest that, aside from at the boundaries of a system, if you find yourself requiring complex setup code then you should evaluate whether your abstractions are neatly decoupled and reduced to minimal necessary complexity.

I disagree. IMHO, modifying your app code for the sake of test code is an anti pattern. It should be viewed as an unfortunate yet necessary, rather than celebrated, action. It taints your design and makes your application harder to understand.

I disagree. If you write tests only after you implement your application and find that they are difficult to test, it is likely not that you are using an anti-pattern in the "unfortunate yet necessary" action of making your code more testable - it is, instead, far more likely, that by failing to write tests as you wrote your implementation, you implemented a poor design that does not decouple abstractions as well as they could be.

It is not a chore to unit test code, it is an extremely useful development tool that forces you to check your assumptions and verify your design as you go to give you a better product.

The attitude of this comment is along the lines of the sorts of attitudes towards testing it takes a lot of effort to un-learn junior engineers of.

Juniors make plenty of mistakes, I would rather they make them then explain why they should change their code rather than them relying on "testable code = better code". I would admit that a junior working alone, or without code review/a mentor, is probably better off relying on that mantra though. Or, perhaps even anyone designing an application significantly larger than they have in the past (without any outside input).

However, everything has a cost, even abstractions. The more powerful the abstraction, the higher the cost. I find that for most of our projects, the sorts of abstractions unit testing forces on us is much more powerful - and thus higher cost - than necessary for the task at hand. If you are not taking advantage of that power, then all you really have left is the high cost.

Nobody even declared code, which is easy to test (verify), as an anti-pattern. Quote, please.

If I have some object I want to test in multiple ways, I find it more convenient to create a single instance during setup and then testing various aspects of just that single instance. Since the constructor of the object in question needs a bunch of values passed in, and each unit test only checks the effects of one at a time, I'm just going to copy and paste a bunch of object initialization code if I don't have a setup step first.

This is not a good practice because a unit test should be able to assume that no other code can possibly modify the data it is using. Later, another engineer could possibly come in with the intent of modifying a single unit test in accordance with an isolated functional change in such a way that the data being used by other unit tests gets modified.

Then some poor fellow ends up redesigning the test suite to actually follow the assumptions of a unit test's test data being only ever accessed by that unit test. I have been that fellow, and it can be, bluntly, quite a pain in the ass; as you're doing it, you can't help but think, "whoever wrote this was too lazy to set up the test data properly, I shouldn't have to spend the time to fix this now."

Ah yes, I should have stated that the object is immutable.

Where are you getting the object to which you're going to send messages?

From the class.

    expect(Klass.new.special_case).to have_this_behavior
Anything more complicated in test code should be considered an anti-pattern.

Careful getting too DRY with your tests. Far too many projects have test that should be failing but aren't because of some silly side effect, caused by DRYness and lack of isolation.

I'd go as far to say that attempts at DRY cause more problems in test suites than copy-paste coding, many of which never even get caught. In a test suite, this is very bad.

We actually ran into this as a problem: our tests were so DRY that it was really hard to adapt them as the behaviour of our application changed.

We've since gone more DAMP (http://stackoverflow.com/questions/6453235/what-does-damp-no...): tests should be descriptive and meaningful by themselves, not abstract, concise and DRY.

It makes for more lines of test code but they are simpler to understand and adapt.

When you abstract away test code, you are abstracting code written to test a software application's design, and thus your abstractions (and test data) necessarily mirror your application design.

This is why I commented elsewhere that DRY should be applied not as a coding principle to all code, but as a design principle to software design.

Going too crazy on non-dry tests can make tests failures hard to track down, though. Meta-programming to generate tests has only lead to pain in my experience.

If you're following the red-green-refactor cycle properly, then you'll have seen test failures on DRYed tests. Most testing frameworks let you customize the failure message. It's usually a simple matter of adding more information about which part of it failed.

I won't meta-program for tests, but I will do things like make a list of classes, or symbols or whatever, to pass to a loop. Just keep it simple.

> I won't meta-program for tests, but I will do things like make a list of classes, or symbols or whatever, to pass to a loop. Just keep it simple.

This is a case where metaprogramming for tests came in handy.

I had a bug recently that involved someone making a change that violated an invariant property of a class. To codify this invariant, I was tempted to do what you did, to make a list of symbols to feed into my test to ensure the invariant was obeyed. However this bug was caused precisely by someone adding a new symbol, a method, that didn't obey this property. The test using this design wouldn't catch this failure. I instead opted to do some introspection (it's Python, so it was dead simple) on the class to ensure all of its methods obeyed this invariant. It took a little extra time to implement but in the end it worked.

With the caveat that I haven't invested the full amount of time necessary to strongly state this opinion publicly, I do feel that metaprogrammed tests are a case of adding abstraction unnecessarily (which, incidentally, is why I haven't taken the time to try to extensively use them on a project.)

A test should be dead simple to read and understand when someone else new to the project needs to understand what it tests. Further, when a test fails, the output should clearly indicate exactly on what line of code an error occurred.

Metaprogramming tests feels to me like a case of a desire for or predilection for cleverness getting in the way of what the task is actually for.

Tests should tell a story. They should not be subjected to the same methods of abstraction used in the code they themselves are supposed to test and verify.

>Meta-programming to generate tests has only lead to pain in my experience.

Likewise, although I view it as a deficiency of testing frameworks that require metaprogramming for parameterized testing. They shouldn't.

And on the other hand, I'm always suspicious of code that does something programming-like that is specific to the test framework.

If you need a loop, ideally you write the loop using the language, rather than using a test framework's special loop construct. For parameterization, ideally you'd use a function, rather than a special test framework concept for parameterization.

Having duals of every abstraction feature of a language in the test framework increases the cognitive load and ramp up time to become effective in a codebase, and there's always the risk of incomplete knowledge, leading to mixing and matching different abstractions - and test framework abstractions are usually more leaky and less well thought out than language abstractions.

Furthermore, they decrease readability and clarity of tests themselves. I am of the strong opinion that if you need to test 20 inputs, you should write 20 asserts, not a loop - when the test fails, you should be able to immediately tell _which_ input(s) failed, not just that a failure has occurred.

I came to this opinion by way of desiring to be clever in my tests and then finding myself having to insert `printf(input)` inside a loop in order to quickly figure out which input failed.

Code should be written so that it doesn't need to be modified. Feelings of aesthetics of desires to be clever should be avoided and are, ultimately, I think, often just vanity.

I'm thinking of rspec in particular, as it's the framework I write most tests in. It's trivial to include the loop variable in the string interpolation that describes the "context" block for the loop inner test setup. But usually I wouldn't be using a plain loop, I'd be iterating over entries from a hash map.

The same thing in something like junit is less workable, and copy and paste becomes more viable simply because the language is so clumsy and inexpressive. I've seen various extensions and annotations that let you parameterize junit tests but I don't think they make things much clearer.

A key benefit of driving the test cases off a table of inputs to outputs is that missing cases in cross products are much easier to spot. Visualising the state space and its coverage is easier when you can see a map of the covered space in one block of code.

The one place where I've had good results from meta programming / DRY for tests is table-based validation; where you have a large cross-product of different states to test, mapping to expected outcomes.

I disagree with this mindset and perceive it as an example of applying a common principle without thinking about what it was meant for.

The Don't Repeat Yourself principle, speaking from my internalized understanding without going back to wherever it was originally described and named as such, embodies the following general principles:

* A specific functionality should only be implemented in one place; implementing the same logic in multiple places creates the possibility of one or more of those implementations being modified without the rest being modified identically. * The total amount of code that must be maintained should be minimized reasonably, such as by eliminating duplicate code. * If the same functionality is being implemented times, that indicates that it may be an essential or otherwise important abstraction in the overall design of an application, and therefore should be abstracted.

That's the gist of it. Those principle apply to software design and architecture. Test code, however, is different in nature. It does not have abstraction and decoupling as a design goal; it is meant to test a software design in such a way as to be able to indicate specifically where an error is occurring. DRY actually works _against_ those goals - if you abstract away a test setup and make a mistake doing so, it is possible to, in the course of refactoring all relevant unit tests to use that abstraction, modify the unit tests to pass. If, in contrast, you keep setup functions entirely contained in each unit test class, you cannot possibly reach this situation, because you can only break one test at a time. The tests retain their function of testing specific source code and only breaking when that source code breaks.

DRY should not be applied without thinking to testing. It can be executed carefully and provide a benefit, but it is difficult to avoid the possibility of adding bugs to the abstractions you choose to make.

Follow up as it is too late to edit and I want to note an observation: this is why I dislike naming things unnecessarily. Giving an approach a name leads many people to treat it as some sort of absolute, because It Has Been Named, and to use it without question.

We should try to understand why we do the things we do instead of applying Named Methods because since they are named they must be absolute.

I am being hyperbolic for effect, but I do believe that this is a real effect, and I personally do hesitate to name things that don't really need a name.

The writing a module and it's tests together, and doing them both at the same time is some of my #1 advice. If you're having to run main while developing, I consider something to be a bit odd.

I also find this a much more favorable approach than pure TDD. In my opinion, This method is easier to "sell" to other developers.

My opinion is that TDD can only reasonably be done if you already know what your implementation must be, which is difficult to do unless

* you have already done it, or something sufficiently similar, before; * you have formal functional requirements; or * you have detailed use cases.

Personally, I've recently come to really favor use cases as part of the design process. The mindset behind them requires some effort to learn, but I find them an effective vehicle for categorically separating what an application must do from any consideration for how it might do it.

This is great. It's nice to see an article from someone who doesn't "do" TDD, but also isn't ranting about how tests are useless. I personally use and prefer (test-first) TDD but still agree with all of the advice in this article.

I find that bugs occur when you do not fully understand all possible state combinations and edge cases. So if that is the case I try to break it down to smaller units that are easier to comprehend. There will still be bugs though, but they are usually edge cases you didn't imagine would happen, and that's where I find testing useful, as the next person who touch the code probably will also miss that edge case.

1) Make changes 2) Manually test & run automatic tests 3) Write automatic tests for each problem/bug discovered 4) Repeat

This only works for decoupled code though. If all units are coupled you must have automatic tests of everything as no-one can comprehend exponential complexity.

What you are describing is the process of identifying what the "units" are that should be covered by unit tests.

I am certainly no religious follower of TDD, but I do think writing tests before code is useful.

The reason is simple: it tests your tests. I have many times found bugs in tests that made them always pass.

There is an even better way to test your tests - the mutation testing, for example http://pitest.org/. It also helps to find dead code, that doesn't have any impact on result any more.

I just took a peice of code with quite good test coverage, and stopped running main a couple of times during the "unit" test run. Coverage plumetted, and I realised how much of the code is still untested.

(The code was actually already structured for testing, I just hadn't written them because of that coverage number....)

I am still running main, by the way, but that's a different invocation called "system tests" which runs if unit tests pass (and after the coverage report).

This is a prime example of why not to covet code coverage figures. They can never be taken at face value as an indicator of testing quality or comprehensiveness.

Incidentally, OP, I empathize - I had to learn it the hard way too.

One important concept in testing is "code coverage". The technique is to (conceptually) place a unique print statement in every branch of every "IF" statement or loop (every basic block), and then try to write tests until you've triggered all of the print statements.

EDIT: This explains the concept, and gives a minimal approach to testing (i.e., you should test more than this, but at least this). Of course, there are tools to automate this, but not for every (new) language.

What you're describing is block coverage, which is only one of many types of code coverage that exist. By itself, it is rather limited in the types of issues it can discover.

There are many other types of code coverage, including branch/conditional coverage, state coverage and so forth that provide much greater depth of coverage that developers should look into using where possible. The Wikipedia article has a good introduction: https://en.wikipedia.org/wiki/Code_coverage.

Is this a joke? (I'm asking seriously.)

Edit: I mean that is there anyone in 2016 who uses testing tools that cannot instrument code to calculate coverage automatically?

I prefer to code in my highly customized emacs configuration when not coding in something that benefits unusually from an IDE (Java). Many people do this, though they seem to be decreasing in number somewhat - seems like younger interviewees increasingly haven't even heard of vim or emacs, somehow.

That doesn't fully test your code. By testing "if a", "if b" separately, you could miss a bug that occurs when a b are both true.

> That doesn't fully test your code.

I didn't say that. Be wary of people who say they fully tested your code :)

Anyway, there's another (non-waterproof) approach: try to trigger all possible paths through the code (instead of all basic blocks), but the problem is that the number of paths can increase exponentially with code size.

The code-coverage approach, in contrast, is very cost-effecitve. For example, roughly speaking, it triggers all possible exceptions that your code can throw.

It won't trigger exception handling that hasn't been written because you forgot to check a certain sort of bad input. In that case, you may still get 100% code coverage, which many will erroneously take to mean that all possible behaviors are verified.

100% code coverage can _never_ be taken, as a figure, to indicate that testing is comprehensive.

> The code-coverage approach, in contrast, is very cost-effecitve. For example, roughly speaking, it triggers all possible exceptions that your code can throw.

Well, unless you count uncaught exceptions from things your code calls.

Or unless you simply forgot to check all possible invalid inputs - which, all of us being human, will happen periodically, even with code review.

But the goal of unit tests is exactly and specifically to test single cases such as "if a" and "if b", discretely and independently of one another. More complex cases such as "if A and B" are what integration tests are written for.

What test suites generally show as coverage is called statement coverage. This only means that all the statements have been executed at least once by the test suite.

There are other types of coverages. Like branch coverage, which is more like what you've described. And path coverage. Path coverage tests that all independent paths are executed at least once but it's tedious and memory intensive to calculate and he hence impractical.

Having responded to several comments here, I am concerned about the fact that most of the discourse here seems to fail to completely understand what the goals are of unit testing - and, worse, many comments, despite this omission, seem to be made with an air of confidence which I could see myself, when I was a junior developer, accepting as reliable, because of that tone. As of this writing, I feel that anyone new to unit testing that comes across this overall discussion will be sent down the wrong path and may not realize it for a very long time, and so I feel that it is important to outline what I feel are the most serious misconceptions about unit testing I see here.

* Code coverage's value: code coverage is not a goal in and of itself. Seeing 100% code coverage should not make you feel comfortable, as a statistic, that there is adequate testing. If you have 100% coverage of branching, you might have indeed verified that the written code functions as intended in response to at least some possible inputs, but you have not verified that all necessary tests have been written - indeed, you cannot know this from this simple metric. To give a concrete example: if I write one test that tests only a good input to a single function in which I have forgotten a necessary null check, I will have 100% code coverage of that function, but I will not have 100% behavioral coverage - which brings me to the following point.

* What to think about when unit testing a function, or how to conceptualize the purpose of a unit test: unit tests should test behavior of code, so simply writing a unit test that calls a function with good input and verifies that no error is not in the correct spirit of testing. Several unit tests should call the same function, each with various cases of good and bad input - null pointer, empty list, list of bogus values, list of good values, and so on. Some sets of similar inputs reasonably can be grouped into one bigger unit test, given that their assert statements are each on their own line so as to be easily identifiable from error output, but there should nevertheless be a set of unit tests that cover all possible inputs and desired behaviors.

* Unit test scope: A commenter I responded to in another thread had given criticism along the lines of that by making two unit tests which test cases A and B entirely independent, you fail to test the case "A and B". This is a misunderstanding of what the scope of a unit test should be in order to be a good unit test - which, incidentally goes along with misunderstanding the intent of a unit test. A unit test, conceptually, should check that the behavior of one piece of functional code under one specific condition is as intended or expected. The scope of a unit test should be the smallest scope a test can without being trivial; we write unit tests this way so that a code change later that introduces a bug will hopefully not only be caught, but be caught with the most specificity possible - test failures should the engineer a story along the lines of "_this_ code path behaved incorrectly when called with _this_ input, and the error occurs on _this_ line". More complex behavior, of the sort of "if A and B", is an integration test; integration tests are the tool that has been developed to verify more complex behavior. If you find yourself writing a unit test that is testing the interaction of multiple variables, you should pause to consider whether you should not move the code you are writing into an integration test, and write two new, smaller unit tests, each of which verifies behavior of each input independent of another.

* Applying DRY to test setup: if you abstract away test setups, you are working against the express intention of each unit test being able to catch one specific failure case, independently of other tests. Furthermore, you are introducing the possibility of systematic errors in your application in the _very possible_ case of inserting an error in the abstractions you have identified in your test setup! Furthermore, f you find yourself setting up the same test data in many places, that should not suggest to you to abstract away the test setup - to you, it should rather hint at what is likely a poor separation of concerns and/or insufficient decoupling in your software's design. If you are duplicating test code, check whether you have failed to apply the DRY principle in your application's code - don't try to apply it to the test code.

And, in my opinion, the most important and common misconception I see here, and I really feel that it should be more widely understood - and, in fact, that many problems with legacy code will likely largely stop occurring if this mindset becomes widespread:

* Why do we write unit tests?

We write unit tests to verify the behavior of written code with respect to various inputs, yes. But that is only the mechanics of writing unit tests, and I fear that that is what most people think is the sole function of unit tests; behind the mechanics of a method there should be a philosophy, and there is.

Unit tests actually serve a potentially (subjectively, I would say "perhaps almost always") far more vital purpose, in the long term: when an engineer writes unit tests to verify behavior of the code he has written, he is, in fact, writing down an explicit demonstration what he intended the program to _do_; that is, he is, in a way, leaving a record of the design goals and considerations of the software.

(Slight aside: in my opinion, being a good software engineer does _not_ mean you write a clever solution to a problem and move on forever; rather, it means that you decompose the problem into its simplest useful components and then use those components to implement a solution to the problem at hand whose structure is clear by design and is easy for others to read and understand. It further means (or should mean) that you then implement not only verification of the functionality you had in mind and its robustness to invalid inputs which you cannot guarantee will never arrive, but also implement in such a way that it indicates what your design considerations were but serves as a guard against a change that unknowingly contradicts these considerations as a result of a change made by someone else (or yourself!) at a later time.

Later, when the code must be revisited, altered, or fixed, such unit tests, if well-written, immediately communicate what the intended behavior of the code is, in a way that cannot be as clearly (or even necessarily, almost definitely not immediately) inferred from reading the source code.

In summary, these are the main points that stuck out to me in the conversations here; I do want to emphasize that the last point above is, in my opinion, the most glaring omission here, because it is an overall mindset rather than a particular consideration.

I like this article, however, I would emphasise that there is a balance that must be struck between writing mini-DSL/fixture generating code v.s. writing simple data structures (e.g. Object literals that mimic JSON).

Its a good thing to take extra care writing the generating code, as any brittleness is passed onto dependent tests.

i think that languages that have a very good REPL make it easier to write tests. Because you play around in the Repl, you automatically write a few tests that you just have to copy. Also, if you design your software repl-friendly, there is a lower overhead for your tests (easier set-up etc.)

I avoid running main by testing at compile-time :-).


Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact