Hacker News new | past | comments | ask | show | jobs | submit login
How deep are your unit tests? (stackoverflow.com)
62 points by thallavajhula on Dec 10, 2012 | hide | past | favorite | 64 comments

Several successful projects exist without TDD (or even unit tests for that matter)

The whole of GNU+Linux for example (or at least most of it)

On the other hand I've seen several projects claim "very good TDD coverage" and then crash and burned when put into production (usually posted to HN)

Real world, real usage testing is essential. TDD is good for keeping you on track and avoid regressions

It's also good for complicated (small) pieces of software that do something complex and is prone to having its behaviour adjusted with time (think: reports, data consolidation or analysis, calculations, etc)

if rms read it, he would suggest editing GNU+Linux to GNU/Linux.

Yes, I wrote it like this to be clear that I mean the Linux kernel and the GNU userspace

I don't think RMS reads HN.

I find that simply assert'ing all non-trivial assumptions and invariants in the code is the form of unit testing that works the best in heck of a lot of cases.

If there's need for explicit tests, just use the code in the right way and see it exit cleanly. Then use the code in the wrong way and see it assert (first replacing the assert() with a throw/longjmp to catch failures without aborting).

Asserts also double as a concise context documentation. When an assert is triggered, it's typically easy to see what its condition means and what has gone wrong. So it's a win-win all around :)

I've found asserts to be useful in addition to unit testing. The advantage of automated tests is that you don't have to run tests manually. Asserts merely tell you that something is wrong, they don't run test cases for you.

I do this. In fact, for a lot of what I do, assertions are good enough to catch any issues early. That and integration tests are probably more valuable than religious unit tests.

The advantage with assertions as well, is if you leave them on in production, when an issue hits, you know about what it is straight away and can fix.

Many of our system errors are fixed and deployed before the user has finished raising the issue.

I agree with the first answer.

I'm not a fan of unit testing. I'm just doing it because it's a requirement (and we still have no QA).

Why I'm not a fan? Because I'm doing my own tests. And probably it has a bias all over it. I know when it will have a successful run and I expect where it will fail. This is not the scenario that I wanted. (I know this is not a good testing scenario)

In my opinion, a programmer must write a good code. And leave the testing to the QA.

This is one of the reasons I really like property-based testing à la QuickCheck[1]. The core idea is simple: you come up with an invariant for your code and the testing framework checks this invariant with randomly generated inputs. So when you're writing a test, you just have to come up with interesting invariants; you do not have to guess which inputs are edge cases.

[1]: http://www.haskell.org/haskellwiki/Introduction_to_QuickChec...

This may sound complicated and seem like overkill, but I've actually found it easier to use this style of tests for much of my code. Having randomly generated inputs can help find edge cases I did not even consider when writing the code in the first place--this addresses the bias issue you are worried about.

I also find that this style of test results in concise, easy to read tests. The usual "hello world" example uses the reverse function for lists. We want to ensure that for all lists, reversing it twice does nothing. The test would look like this:

    prop_reverseTwice ls = reverse (reverse ls) == ls
It's very easy to tell what invariant you're testing for! (The prop_* name is a convention for this sort of test.)

While real invariants are often more complex, I find they are still usually easy to read right from the code for the test.

Anyhow: you should really try QuickCheck or something like it for your language of choice.

I agree that QuickCheck is really nice. It helped me spot some errors in e.g. tree diff'ing code that I wrote. QuickCheck especially works well for fundamental data structures and algorithms, where invariants are clear and simple.

However, in other cases, the usage is less obvious. For instance, suppose that you are writing, say a URL parser. What do you want QuickCheck to generate and check? If you generate too broadly, the invariant check becomes a parser itself, if you generate too specifically you might introduce bias again.

That's a good point. I remember having exactly that problem when writing tests for my toy language's parser. Generating random valid programs was actually tricky for several different reasons. In fact, I ultimately gave up on QuickCheck completely for that project, but mostly out of laziness.

Amusingly, I also wrote some tree diff code in Haskell recently. However, I did it rather poorly so I'm going to have to rewrite it soon (probably some time during winter break). I definitely agree that algorithms like this are particularly well-suited for QuickCheck.

In my opinion, a programmer must write a good code.

I always write good code, but somehow that code turns to shit after six months, and I promise no one touched it. So, I guess I don't need the tests when I'm writing the code but I need them badly when I maintain it.

That's quite opposite for me, I write code with lots of tiny errors (most of them 0,1 based array sizes or not understanding how API works), so I test as much as possible. But tests true value is regression testing. I can refactor something, and test knowing that if I messed something up, test will flare up.

I still try to avoid errors, but it's pretty much nigh impossible.

Oh dear Lord, how true this is.

I suspect there are some subtleties to that first answer which you've overlooked.

It is written by Kent Beck. He pretty much invented the concept of test-driven development. He's probably the most well-known advocate of developers writing their own tests there is.

What his answer demonstrates (to me) is that the conclusions he and the other XP guys arrived at - to write code in tiny chunks, each one preceded by a tiny failing test - is based entirely in pragmatism. He does it because it works better than anything else he's come across, including leaving the tests up to a dedicated QA team.

"... because it works better..."

Except that it hasn't been shown to work better. Quoting from http://third-bit.com/blog/archives/4529.html: "If you ask its advocates for evidence, they’ll tell you why it has to be true; if you press them, you’ll be given anecdotes, and if you press harder, people will be either puzzled or hostile."

There is very little research which shows that TDD is better than other approaches. Perhaps a bit, for some cases, but not enough to be certain about its general applicability. (Or has this changed in the last three years since I researched this topic?)

I hate writing code in "tiny chunks, each one preceded by a tiny failing test." I prefer to use a small set of burn tests until I have the main structure in place. Only then do I do the tiny tests - through the high-level API where possible (so I can refactor later) - and I use coverage analysis to make sure I'm not missing any obvious tests.

Coverage analysis is not part of TDD, which is a shame. (Technically that's because TDD is a design methodology, while coverage analysis is part of testing. While TDD generates tests, these are not a complete set of tests. For example, security analysis and scalability are outside the scope of TDD.)

Except that it hasn't been shown to work better. Quoting from http://third-bit.com/blog/archives/4529.html: "If you ask its advocates for evidence, they’ll tell you why it has to be true; if you press them, you’ll be given anecdotes, and if you press harder, people will be either puzzled or hostile."

Oh good, evidence against is just as preachy and unscientific as the evidence for.


Research suggests that unit testing has a very strong correlation with correct software. Research suggests TDD has no correlation with correct software, except for the fact that TDDers tend to write 2x more tests than non TDDers.

Your requirement of 'evidence against' isn't how science works. The burden of evidence rests with the people making the claim. If you make a claim and do not provide evidence on request, that claim is not supported; it is not necessary to supply 'evidence against' for the claim to be unsupported, and there's no good reason to believe unsupported claims.

This is a discourse not science class both sides of the argument should have evidence.

Who says there are only two sides? There are dozens or hundreds. I can make a new one - programming is best done with a candle burning on the table and a glass of wine or whisky available. It works for me, so I say it should work for everyone. I call it "Romantic Driven Programming" or "RDP." Now, what's the evidence that you have against RDP as a best practice?

You don't have any, do you? So obviously I can now promote it as the new universal methodology.

(Or more directly, you think the argument is that there is evidence for and evidence against TDD. My argument is that there isn't enough evidence to make a conclusion.)

Given that this is a discourse, have you read any of the discussions of the topic of TDD? For example, "Making Software" (e.g Oram and Wilson) has an excellent chapter summarizing the research results. If not, then it's a rather unbalanced discourse, no? You pointed to a list of papers, but didn't describe what you wanted to discuss from them.

For example, some of the research compares test-first with test-last. I'm a proponent of test-during, and don't think that test-first and test-last are the only two ways to compare things.

I'm not sure how one would do such research.

You would need to compare teams of equal skill level who were working on very similar software over quite a long period of time.

What would the benchmarks be? Number of defects in production code? Time to ship? Severity of defects? Time spent debugging?

It's not easy research, but there have been some papers published on the topic. The problem of course is that there are a lot of confounding factors.

Besides, that my point, isn't it? Promotors of TDD argue that their personal intuition suffices, despite having little evidence to back their claims.

> In my opinion, a programmer must write a good code. And leave the testing to the QA.

Aside from the "not my problem" vibe that gives off, I strongly disagree. QA is helpful for covering weird user interactions and writing tests against bugs discovered in production, but you as the programmer should know what your code is supposed to do while you're writing it. Best write tests then rather than six months later when you've forgotten what it was for.

And when someone else is maintaining your code, and changes something deep within in it that has an unexpected side effect, how will you tell whether it's broken something or not?

That is the only time I write unit tests. I never write them first run of software, but when I start changing the product down the track maintenance/upgrades/bugs/what ever I'll add tests to make sure things don't explode, and hopefully I'm not introducing problems. That said often things are rushed and they get ignored, even when I'm low on confidence about said introductions of errors :P

The thing I really dislike are unit testing styles that use mock objects a lot.

I see some people say that you should test each object in isolation, mocking out any dependencies. That seems wrong headed to me.

I prefer my unit tests to test everything all the way down.

The only thing I would mock is the file system, which is useful for testing file loading code.

I prefer my unit tests to test everything all the way down.

They already have that, that's called integration testing. Unit testing is trying to test as small of a unit of code as possible - this helps you identify exactly where the error is. If all of your tests are integration tests and something fails, you have no idea which part of your stack the problem is in.

Obviously different projects require different kinds and amounts of test cases, but I prefer a mixture of both unit tests and integration tests (unit tests for small, complex blocks of logic, integration tests for most of the other stuff).

If all of your tests are integration tests and something fails, you have no idea which part of your stack the problem is in.

Not necessarily, because if you have a lot of integration tests then probably multiple ones will fail given a bug in a certain module, and the pattern of the failures might be all you need to locate the source of the problem.

One of the most successful projects I ever worked on, in robustness/quality terms, didn't really have any unit tests at all, but it had a comprehensive suite of end-to-end test cases that could be run automatically. Many of those didn't (and couldn't) have an absolute true/false outcome, either, but looking at the results they generated and applying various heuristics developed from experience, they were remarkably useful for similar reasons to unit tests.

Here are qualities of good unit tests:

1) They run fast.

2) They help us localize problems.

In the industry, people often go back and forth about whether particular tests are unit tests. Is a test really a unit test if it uses another production class? I go back to the two qualities: Does the test run fast? Can it help us localize errors quickly? Naturally, there is a continuum. Some tests are larger, and they use several classes together. In fact, they may seem to be little integration tests. By themselves, they might seem to run fast, but what happens when you run them all together? When you have a test that exercises a class along with several of its collaborators, it tends to grow. If you haven't taken the time to make a class separately instantiable in a test harness, how easy will it be when you add more code? It never gets easier.

- From "Working Effectively With Legacy Code" by Michael Feathers. (A really good book that actually does a better job of teaching design than many a book because like all good teaching it starts with the examples and abstracts from there, not the other way around.)

That's what I thought until my first summer internship at a Rails shop. Our unit test suite could take a very long time to run (on my Macbook Air dev machine, it would take 15-20 minutes to run). The slowdown came from testing ActiveRecord models with many dependencies, and that we were creating sample models. As a result, our tests hit the test database multiple times, which made them slow down.

Our solution was to start mocking as many models as we could, which really did improve test runtime.

(Slight edit: after reading the other replies, now I really understand what integration tests are for.)

How many issues do you think may be hidden by mocking out the database layer? It's easy to write bad data due to a subtle bug in a query builder for example and then write garbage into the database. Mocking would make that significantly harder to catch.

You're quite right, query builder statements are error-prone and difficult to stub; that's why you shouldn't sprinkle them throughout your app. Rather than write things like

  User.active.in_group(x).not_replied.each { ... }
all over the place, write scopes on your User model (e.g. User.awaiting_response_for_group(x)) that represent each use case, test them thoroughly against the database, and then stub them out when testing client code. That way you only have to test the queries in one place, and you don't couple the rest of your code to your query builder.

To avoid getting hung up on taxonomy I much prefer defining the scope of tests with the control theory terms of Observability and Controllability, http://en.wikipedia.org/wiki/Observability. That makes it way easier to categorize tests in terms of how effective they are and gives useful starting point to determine whether the granularity of the test really is able to adequately test the desired behaviour.

Control theory also tells us how feedback loops are related to those two terms; a lot of discussions about Agile, etc. would be much more productive if people understood that it is the feedback loop that matters (as opposed to particular methodology flavour) and why it does.

So, you prefer your unit tests to be integration tests?

I think that's more towards some kind of integration testing. You're supposed to test one unit independently, and then write integration tests that actually tests that the whole system works. Not saying that's the best way but i think that's how people think.

A unit test tests a unit of code. An integration or functional test tests the interaction between multiple units of code, usually from the point of view of "user needs functionality X, functional test Y ensures that X works". Unit testing is not supposed to catch bugs, it is primarily to ensure a solid design. Integration/functional tests are primarily there to ensure that the code works.

It only makes sense to unit test code which is in the domain of architecture. This ensures that your domain model is solid.

It only makes sense to functional test code which is a concrete implementation running on top of said architecture. This ensures that the end user sees what they need to see.

One of the often quoted benefits of unit testing is that they run fast. That fact is only incidental. Not to mention the insanity of unit testing as much as you can just because they're faster to run - they are also an order of magnitude slower to write than functional tests.

tl;dr - use the right tool for the right job. Unit testing is for design/architecture. Functional testing is for concrete functionality.

Technically it is called integration testing but I agree with you on the idea.

Unless there's a reason to mock something up (like simulating a fail condition from a DB or network call, or simulating other code that isn't finished yet) I prefer to just test everything as close as possible to what is actually running. The level of granularity, like the OP said, changes depending on the complexity and the critical nature of each part of the code.

Some people argue that if I make an error in function A and that causes a unit test fail for function B - that can mislead a developer into thinking function B has an error. I argue that a) it rarely misleads anybody for very long and b) if I had function B tested in total isolation with mocks then that error in function A might have slipped through unnoticed. It's tricky to mock up some combination that you weren't expecting to happen.

Obviously in ideal circumstances you would be doing both unit and integration testing. But, you have to draw a line somewhere otherwise you just write tests all day and get no production code running.

If you need lots of mocks, it's a sign of coupling which means a bad fundamental design.

You need to test both the components individually (unit tests) and their interaction (integration tests).

We use Commons VFS for filesystem abstraction ( http://commons.apache.org/vfs/ ) and our own port of the API for .Net.

In fact we have our own systems API (more Google AppEngine style thank say the .Net Framework or Java libraries) on which we build our software.

That may be true, but sometimes it can also be a pain when you are talking to everything through a load of abstraction layers.

When something blows up it can be nice just to get a simple error rather than a 20 page stack trace.

That may be true, but sometimes it can also be a pain when you are talking to everything through a load of abstraction layers.

Exactly. Sometimes the benefit of a certain kind of test does not outweigh the cost of distorting a useful design so it will support the test code. That cost might be directly in terms of performance overhead, or it can be a more subtle overhead through (for example) weakening language-enforced modularity in your architecture so you can insert mock/stub placeholders where you want them for testing purposes, which can hurt everything from maintenance overheads to the effectiveness of code reviews.

I think dependency injection is designed to mostly solve that problem but with that you want to program in a very specific way (e.g calling the DI framework's factory methods rather than just using "new" and a constructor method).

This makes it more difficult to retrofit to code and is sort of awkward when you are integrating with different bits of code and libraries that don't use DI.

Thus the usefulness of mocking frameworks to "force" a particular class to be a certain implementation for the purposes of testing.

Hint: You never have to distort a design for testing, if it's not a crock. It's a great indicator of brokeness.

That's not a hint, it's a conceit, born in the kind of fantasy world that drinks whatever folks like Bob Martin are selling and loves names like SimpleBeanFactoryAwareAspectInstanceFactory and functions no more than five lines long.

In the real world, software architecture is the servant of many masters, often including multiple kinds of testing, and sometimes you can't do everything that each of them would like all at the same time. For a start, they can be objectively contradictory: every time you add a dedicated test hook or introduce inversion of control to support unit testing, for example, you have also created additional cases that need to be considered during a code review.

However, it's important to remember that all these testing methods and design techniques are merely means to an end. They are valuable precisely to the extent that they help us build software to do useful things, and no more. If they sometimes conflict, that's OK, as long as we favour the one that most helps us to get the job done. However, things tend to go wrong if we start nerfing one aspect just because dogma says we must not compromise on another.

A 20 page stack trace is the symptom of another problem, not a result of decoupling.

True, but the most popular method of decoupling seems to be via using an abstraction layer of some kind. At some point you need to talk to a concrete "thing".

It's also difficult to know ahead of time how much abstraction is the right amount ahead of time.

For example, you abstract disk access through an RDBMS but you might want to change DBMS so you abstract that through an ORM. But maybe you want to switch away from RDBMS altogether and go with a NoSQL solution or do everything with message queues or REST APIs? Should you abstract up another level also?

That's vertical decoupling.

Well the point of the abstraction is that you don't care about the contents, hence the term "black box abstraction".

It doesn't matter to the interface consumer what the implementation is, just that it does what the interface says it will do.

Our VFS layer has various backends which can be picked based on the client's requirements and this covers REST APIs, RDBMS and filesystems. There are various depths of stacks and abstractions hiding behind all that but to the consumer, it doesn't matter.

If it blows up anywhere in the stack, we can throw 80 levels of stack right down to the container thread if need be, but we only care about the abstraction at the point of failure.

True, if all of your classes can conform to the same interface.

When you integrate some third party black box with a different interface you're probably going to have to write at least one wrapper class.

True also. We wrap Amazon S3's client behind our VFS layer.

Testing is something I do a lot, and like talking about a lot as well. There are no hard and fast answers to testing, like other stuff, there are lots of opinions(they are like assholes, everyone has them, and the others' stinks) about testing. So here goes mine(not necessarily correct, but worth putting them here). I am putting them as one liners, let me know your thoughts.

1. What do you mean by unit in unit tests?

An implementation of very basic feature, probably not more than 100 lines of code.

2. What do you test when you write a unit test?

We test the basic correctness, eg. given a url, if it exists assert 200, if not assert 404.

3. What should not be tested?

The inbuilt libraries should not be tested. Eg. If you are using a library say django, you should assume it comes pre tested and should not spend time writing tests for them.

4. What should be the depth of the tests?

There are no rules, but you should stop when you feel like you are humming inception theme.

If you want to read in slight more detail its http://www.blog.fruiapps.com/2012/09/An-intro-tutorial-to-te...

Kent Beck's response is interesting. How does one know that you write correct code until you realise there is a bug in it? I thought unit tests were developed for when you refactor code - at this point you run the unit tests and if any fail, you have refactored in a bug...

It's partially an answer to jiggy2011's question from http://news.ycombinator.com/item?id=4898861, but I think it's worth mentioning separately.

There's a very good sentence in SO thread - j_random_hacker's: "Every programmer has a probability distribution of bug locations; the smart approach is to focus your energy on testing regions where you estimate the bug probability to be high."

So Kent Beck is not skipping some tests because he thinks he's awesome (as jiggy2011 said it sounds like); he does that because he can anticipate, from experience, that a particular type of bug is of very low probability for him, because he knows he's unlikely to make it. Like, for example, I tend not to put + instead of ++ in C++, so I can be pretty confident my code is free of these types of errors. Everyone needs to estimate his probability distribution for oneself (factoring for unknown unknowns) and test accordingly. The more experience one has, the better the estimate.

A good strategy I've found was suggested by Reid Burke in response to this answer:


Your code isn't uniform, and so you shouldn't be expecting 100% code coverage on everything either.

Instead, assign to particular sections of your code levels of stability, and for the code that has the highest stability/least amount of expected future change, prioritize writing the most unit tests for those.

> "If I don't typically make a kind of mistake (like setting the wrong variables in a constructor), I don't test for it."

Hmm, that sounds like the old "You don't need to test , just be an awesome programmer duh!" argument.

The kind of mistakes you never anticipate making are probably the sort of stuff you should be testing for.

The amount of times I've had something fail on an edge case that turned out to be a typo in a var name on some edge case somewhere..

don't know why this has so many downvotes.

Your tests routinely fail because of typos?

I've seen production code fail because of typos after months in deployment (because that's how long it took that else { branch to be hit in practice).

The code in question was only subject to very high level smoke tests that looked like they were basically cherry picked best case scenarios intended to pass.

A unit test would have picked this up before it even got to production.

In dynamic languages? Sure they may. The PHP script you thought works crashes, the Lisp routine out of the sudden enters the debugger, things like that.

The question isn't what "may" happen, but what routinely does happen for you.

If typos are even a significant problem in your code then the solution is to type more slowly, have your IDE flag unknown symbols, use autocomplete, etc. As a solution to this problem, strong typing is both disproportionate and inadequate.

Well. I use a lot of unit testing because the idea of testing small pieces of code works for the way I develop software at work - modified pair programming. Instead of reviewing Holmes's code Watson writes tests (mostly unit ones).

At spare time, tho, I tend to be too lazy to write a lot of tests(most of my "projects" are too small and never get released to the public anyway)

I try and cover the easy 80% where it's helpful, and then specific things in the difficult 20% on an as-needed basis, or when bugs are discovered. It's an asymptotic curve: you want to spend the effort for the low-hanging fruit, but not get caught up chasing after that time-eating final bit of coverage.

A lot of the comments say tests are needed for edge cases and discovered bugs.

How does this compare with TDD?

Has anyone tried both? How did it work out for you? I implemented TDD for a project and I thought it was overkill and took a considerable amount of my time.

do it for more projects, don't you think the first time you did anything it took helluva time??

Is it me, or does Bill the Lizard close every SO thread linked from HN?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact