1. In dynamic languages, simple type mismatches, wrong variable names etc. are now caught in "top level system level test". Yes these are bugs that should have been caught by a compiler had we had one.
2. There's no documentation as to how something should work, or what functionality a module is trying to express.
3. No one dares to refactor anything ==> Code rottens ==> maintenance hell.
4. Bugs are caught by costly human beings (often used to execute "system level tests") instead of pieces of code.
5. When something does break in those "top level system tests", no one has a clue where to look, as all the building blocks of our system are now considered equally broken.
6. It's scary to reuse existing modules, as no one knows if they work or not, outside the scope of the specific system with which they were previously tested. Hence re-invention, code duplication, and yet another maintenance hell.
Did I fail to mention something?
UT cannot assert the correctness of your code. But it will constructively assert its incorrectness.
In my experience, with enough poorly written unit tests and a hard line policy on unit test coverage (which often results in poorly written tests), the unit tests often turn out to be as much a barrier to code maintenance as the regular code is.
I'm an advocate of judicious unit testing, but taking it too far is worse than not doing it at all if you have integration or system / functional tests IMHO.
Code was covered without assertions, so I insisted spending some time into writing at least a few while everyone else just asked me to test. And of course once in production tons of bugs started to appear. I still believe I can write system tests though.
One nice part of more system level tests than unit tests. Hi ahead ans refactor a bunch. If the same inputs still give the same outputs you should be golden still.
That's dangerously subtly wrong. It only holds if your inputs include both positive and negative cases. If you only push positive cases through your code and it still gives you the same outputs then it might be broken in subtle but non-obvious ways, aka happy-path testing.
Better be very careful with this.
I agree, having some good tests is very good. Obviously.
But I disagree with the hypothetical statement "someone writing a lot of tests is always doing a better thing than someone writing a lot fewer tests".
No silver bullet and all that. Also annoyed by "agile methodologies" and "best practices" in general. Good sense and taste really can't be turned into a list of rules you can blindly apply anywhere. I suggest reading http://alistair.cockburn.us/Characterizing+people+as+non-lin...
I say yes. Of course this is a big, complicated, world; it probably contains good programs without tests. But in my usual experience, the moment a programmer - even a good one - neglects testing, bad stuff starts creeping into the code.
If people cared about software quality, they would use strongly typed languages before even thinking about "unit tests". What people actually want is a) to hack it out quickly and b) to have a lot of busywork to do as a form of job security. Hence 50,000 lines of "unit tests" per 10,000 lines of application...
It's not about strongly vs weakly typed languages, but statically vs dynamically typed languages.
What you are talking about are benefits of statically typed languages.
For more information, please see:
On the other hand, many of the most egregious problems are due to weak typing and not as static advocates, well, advocate, static vs. dynamic.
Having a really good type inference engine helps make statically typed languages much more productive - but lisps and the like have real advantages too.
I don't think this kind of criticism is healthy, you're trying to live on top of your ivory tower calling out everyone else who isn't like you as lazy and looking for job security, you really do think there's no other reason for people using languages that you aren't so fond of?
Honest question, because this kind of message is just unnecessary aggression.
There are certain kinds of software for which strongly-typed languages (in the sense of ML, not C/Java) are heavily underused. For those cases, parent's rant is quite appropriate and mostly justified.
For example, we really, really want to have our browsers written in a language like Rust, rather than C or C++. Same with web servers, and public-facing web applications.
When was the last time you asserted a variable's type in a statically typed language in a unit test where you weren't testing reflection?
The whole point of dynamic/duck typing is that you don't care about the specific type, only that it works as you expect it to.
I totally agree with the sentiment up thread that it's super condescending to think you're a better programmer if you use a language with a good static type system, but from a personal perspective, it's so much more pleasant to let the compiler check this trivial stuff so that I can focus on testing actual logic.
You test for functionality, not types.
I find that lots of tests end up being of this sort, that I'm uncomfortable if I don't write them, that they clutter up test files and obscure the more interesting tests, and that they are exactly what type checking catches automatically.
I'm happy to pay that, but by no means suggest it's the right trade-off for every scenario.
| +-- FloatingPointError
| +-- OverflowError
| +-- ZeroDivisionError
| +-- IndexError
| +-- KeyError
| +-- UnboundLocalError
| +-- BlockingIOError
| +-- ChildProcessError
| +-- ConnectionError
| | +-- BrokenPipeError
| | +-- ConnectionAbortedError
| | +-- ConnectionRefusedError
| | +-- ConnectionResetError
| +-- FileExistsError
| +-- FileNotFoundError
| +-- InterruptedError
| +-- IsADirectoryError
| +-- NotADirectoryError
| +-- PermissionError
| +-- ProcessLookupError
| +-- TimeoutError
| +-- NotImplementedError
| +-- RecursionError
| +-- IndentationError
| +-- TabError
| +-- UnicodeError
| +-- UnicodeDecodeError
| +-- UnicodeEncodeError
| +-- UnicodeTranslateError
Now, on this list, how many can be linked to language typing ?
| +-- UnicodeError
| +-- UnicodeDecodeError
| +-- UnicodeEncodeError
| +-- UnicodeTranslateError
So basically, relying only on typing will cover about 10% of the error types, most of which are caught with linters such as flake8 and a code intelligence tools such as Jedi. You are making a very weak case.
Runtime class generation and unexpected user input can actually be handled in types. Any language with Type:Type, generative modules, first-class modules, and other variations all handle various forms of runtime type generation. Something like lightweight static capabilities  can ensure runtime input conforms to expectations encoded in types.
Furthermore, you are way underselling types with that restricted list. NameError, SyntaxError, ImportError wouldn't occur in any typed languages at runtime. LookupError, AttributeError, AssertionError, ArithmeticError wouldn't occur in some of them.
Finally, your breakdown doesn't cover how frequent any of these errors occur. For example, even if typing were to solve only TypeError, if TypeError consists of 90% of runtime errors, that would be a huge win.
How would you if you can't deternmine type in advance, check for the type it will be ? I don't understand.
> Furthermore, you are way underselling types with that restricted list. NameError, SyntaxError, ImportError wouldn't occur in any typed languages at runtime.
Absolutly not. Those are not linked to typing in any way, and any decent Python editor catch them.
> LookupError, AttributeError, AssertionError, ArithmeticError wouldn't occur in some of them.
AssertionError ? Serioulsy ? Do you even know what it does in Python ?
And LookupError, unless you got constant sized containers, which has nothing to do with types, you can't check that.
ArithmeticError ? Come on ! Are you numbers constants ? You need to check the inputs so that they belong to the domain of your problem, at that's it. Nothing to do with types.
> Finally, your breakdown doesn't cover how frequent any of these errors occur. For example, even if typing were to solve only TypeError, if TypeError consists of 90% of runtime errors, that would be a huge win.
Yes but it's not. My last week have 90% keyerror, and empty values. The input is usually the source of errors.
Types are useful, and they come at a cost. Whether your want to use them or not is a technical choice to make. But selling types the way it's been done on this thread is dishonest. Or you are all here working with algo problems and very little user input. Which are the easy program to code. The hard part would be the algo, not the code. If you are dealing with a user app, a web site, a video game, typing will not save you from the 90% of the bugs, and you DO need unit tests.
Compilers do this all the time. Just consider all those programs that dynamically generate code based on objects that they haven't seen in advance, like object-relational mappers. Those can all be statically typed and ensure that they generate correct code .
> Absolutly not. Those are not linked to typing in any way, and any decent Python editor catch them.
It's all part of compilation, and furthermore, we can type check code generation so that names errors don't even occur in runtime generated code. Sorry, but all of these errors are related to type checking.
> AssertionError ? Serioulsy ? Do you even know what it does in Python ?
Just google "static contract checking" to find plenty of work, including compilers already available for Haskell. Heck, code contracts have been widely deployed on .NET for years now.
> And LookupError, unless you got constant sized containers, which has nothing to do with types, you can't check that.
Please read the lightweight static capabilities paper I already provided. It demonstrates using the Haskell and OCaml type systems to statically assure that all array bounds are in range, even for dynamically sized structures. So yes, you can check that, which doesn't even go into dependent typing where checking such dynamic properties is the whole point.
> ArithmeticError ? Come on ! Are you numbers constants ? You need to check the inputs so that they belong to the domain of your problem, at that's it. Nothing to do with types.
You don't seem to realize that types are simply logical propositions about a program. They can be ANY proposition about a program, including that indexing is within bounds, that a concurrent program has no race conditions or deadlocks , that programs terminate, that HTML forms correctly serialize/deserialize values from/to structures , and more.
Like most dynamic typing enthusiasts, you don't have a proper appreciation for the true scope of static typing. You are correct that typing has its costs that aren't always warranted, but you are incorrect about where you draw that line because I don't think you fully understand how powerful static typing truly is.
 - https://andreacensi.github.io/contracts/
I'm not sure where you got that information. You don't actually write tests to catch types. You write tests to check behavior. While it is possible that actual type errors come out of the woodwork while doing that, that is not the reason why the test was written in the first place; and after fixing such a type error, the test itself is still valid (again because it is making sure that the code exhibits the desired behavior).
I don't think I have EVER written a test just to check for types (in Python). Neither do I pepper my code with 'assert isinstance(x, some_class)' because it just goes against the grain of a dynamically typed language.
I'm talking about things like, what happens if you press this button; is this calculation correct; does this lookup give me the desired results; does X get stored in the database correctly (or retrieved from it); etc. I don't know of any type system that lets you check these things.
(Also, I assume s/strongly/statically; Python is a strongly typed language. Granted, not everybody agrees about the terminology...)
Depending on the power of your type checker, you can say (with varying degrees of convenience) things like "this integer will always be positive", "this is always going to be a non-empty list", "these two functions can be composed", "I've exhausted all possible values for this variable; I know it because the compiler told me so", "it is safe to call this function, and I know it won't touch the database or do any kind of I/O", and many more.
Whenever you called a function and it wrote to a file when you didn't want that, that's also a type error!
If you do it you are doing it wrong.
Check this out: https://github.com/Tygs/tygs/blob/master/tests/test_componen...
The type checks are "isinstance". It's 10% of the checks, and most of the time I don't even have those, my colleague insisted.
The rest are the most important tests: they test behavior, not types.
Do not assert a dynamic language means testing types, it's just not true. Bad testers test types.
Types can encode much more. They can encode behavior.
Also, in Elm as an example, there is no runtime class generation, the type of user input that is possible is checked at compile time, and you're forced to explicitly handle many of these potential failures. If you want to crash the program if you get unexpected user input, you literally need to type "Debug.crash" and then you pretty much deserve what you get.
If you're actually interested in maintainable code you should concentrate on reducing the state space of your program wherever possible, then using exhaustive (if possible) or property testing to fill in most of the rest of the gaps. Unit tests can help too, but I really think of them as less useful than all of the above.
Also, you should exclude:
| +-- IndentationError
| +-- TabError
Finally, I would argue that most instances of this exception are caught by a type system:
| +-- KeyError
> I'd also exclude:
> +-- StopIteration
> +-- StopAsyncIterat
> which are used for mere communication rather than describing real error cases.
> Also, you should exclude:
> +-- SystemExit
> +-- KeyboardInterrupt
> +-- GeneratorExit
def test_ready_keyboard_interrupt(aioloop, app):
beacon = Mock()
real_stop = app._finish
app._finish = beacon_stop
assert beacon.call_count == 3
> +-- ImportError
> +-- MemoryError
> +-- SyntaxError
> | +-- IndentationError
> | +-- TabError
> Finally, I would argue that most instances of this exception are caught by a type system:
> | +-- KeyError
KeyError should be super mega tested:
- keys can be any mutable in Python, not string;
- keys are very often generated on the fly, not constants;
- dict are mutable, you can add or remove keys.
By encoding the possibility of a missing key in the return type, you force the programmer to deal with it in the program.
So in fact, having a good type system can help deal with those kinds of errors as well.
> Well, doh :) Type system or unit tests won't catch that. Tooling, tooling, tooling.
You need neither unit tests or a type system to catch those. And you also don't need any tooling. The Python interpreter will throw these exceptions as soon as your program starts.
(Unless, of course, you rely on stuff like heavy dynamic importing at runtime, but that's really rare. I usually see this only for web servers in debug mode, where they auto-reload the app after a changed source file. But then my above comment applies: The faulty program crashes right away, you can't miss that.)
> - keys can be any mutable in Python, not string;
EDIT: I stand corrected, I don't know enough about Python in particular. The argument is still valid for other languages with weak types.
Python is dynamically type. It is NOT weakly typed.
Python is strongly typed, as well as dynamically typed.
In : 1 + '1'
TypeError Traceback (most recent call last)
<ipython-input-11-861a99da769e> in <module>()
----> 1 1 + '1'
TypeError: unsupported operand type(s) for +: 'int' and 'str'
I certainly don't disagree with what I guess will be the overall direction of your argument.
You are making a very weak case.
> either something completely stupid like a typo...
Typos are a bigger problem in languages that silently instantiate a new variable in response. If it's on the rhs, would you prefer your IDE to warn you about using an unititialized variable, or let it go? Your coding flow is paid for by more failures (some of which will be WTFs) in testing (and testing is part of coding these days, right?) That may be the right trade-off, in which case you may be able to disable warnings in your IDE.
>...or some edge-cases not covered properly by code logic (which only proper testing and QA can catch).
The 'only' suggests that they are inevitable, and they are in a statistical sense, but whenever my code logic fails to cover an edge case, I ask myself why I overlooked it. Sometimes it is not something I could have anticipated even if I had thought more about what I was doing, but often it is.
Maybe the languages you are using aren't capable of encoding the kind of screw ups you do make as type errors? Typos are trivially covered by most static type checkers, but like you say that's not a very interesting kind of error (except, of course, with languages that silently introduce fresh variables. Those can introduce hard to find errors...). More interestingly, they can also handle other problems like dereferencing nulls, not considering all possible patterns of a value (think "case" conditions), calling a function on the wrong kind of value, etc.
The majority of development is performed by people in corporate environments, 9->5. They simply know that they "have to" write unit tests.
No reasoning is performed, simply churn out the code to satisfy the boss and walk out at 5pm.
I just tried to put that code behind me and yes, those "tests" were there because they were mandatory.
For the most part, unit tests aren't organized around what a unit is supposed to do from a use case perspective, but more about how it happens to be implemented today. The article refers to this with the inability of most developers to indicate which business requirement would fail if the test failed.
When you're attempting to understand or refactor something where the testing for the most part tests the implementation of this particular unit rather than it's purpose, the tests themselves offer little in either documentation or refactoring confidence than the implementation itself.
What's worse, making your code "more testable" often compounds this effect. As code is taken to an extreme level of modularity, and unit isolation is enforced rigidly in tests, the tests often do nothing other than specifically test every line of the implementation of the unit (i.e. method 'foo' on class 'Bar' calls method 'a' on class 'B' and passes it's results to method 'x' on class 'Y', returning that result)
One example is where there might be preconditions for calling a function. This type of information is necessary to know but can be hard to work out even with rich documentation. Without documentation, you are going to be reading a lot of code to try and reverse-engineer the internals.
With unit tests, you can do a quick "find all references", see how the method is called within different tests cases and be confident that the setup actions in the given test fixtures will work for you.
Saves writing 12 unit tests that show passing null breaks the code. No one cares about that.
Asserts are documentations on valid inputs and outputs that very cleanly break when violated.
It feels like discussing this without real code is likely to cause confusion. The type of systems I have in mind are those that invert control through e.g. pub/sub events, plugins, strategies, dependency injection etc. Unit tests help document how to use components in a system that has traded clarity of control flow in exchange for extensibility.
How a system is wired together is not that exciting and not worth unit tests. If you screw up loading a config file on startup o bet your first integration test fails. Why would you unit test this?
Too fine grained unit tests actually make refactoring more difficult, because a refactor typically touches so much code that they have to be rewritten. But component tests do provide the safety net that you had in mind when you wrote your reaction.
> Ok let's stop writing UT and see what happens. Wait... we've already tried that, and we know the result pretty well:
Stuff still doesn't work and doesn't get delivered on time. That's all we know, and unit tests didn't change that.
> Yes these are bugs that should have been caught by a compiler had we had one.
WHAT? What kind of dynamic languages are you working with?
> 2. There's no documentation as to how something should work, or what functionality a module is trying to express.
Unit tests - at least those made in TDD style - tend to obfuscate that; they usually reflect the implementation instead of the interface anyway.
> 3. No one dares to refactor anything ==> Code rottens ==> maintenance hell.
You seem to know very timid programmers. Or the kind who think tests are their safety net, so they don't bother to actually read the code they're changing and understand what it does.
> 5. When something does break in those "top level system tests", no one has a clue where to look, as all the building blocks of our system are now considered equally broken.
Look at the crash log. Or think for just a second instead of saying "it's broken" and giving up.
> 6. It's scary to reuse existing modules, as no one knows if they work or not, outside the scope of the specific system with which they were previously tested. Hence re-invention, code duplication, and yet another maintenance hell.
Re-invention, code duplication, etc. have nothing to do with tests (hell, most unit tests I've seen have quite a lot of code duplication in them). It's not scary to reuse existing modules without tests, it just requires to... use them.
Pretty much all the points you listed are examples of people shutting down their brains and hoping someone else's unit tests will think for them. This is a very bad approach to programming. It's the wrong kind of laziness.
The point number 2 simply proves that you don't know what a proper unit test looks like if you say that a unit test obfuscates the documentation.
The point number 3 just confirms what I thought, that you believe that you don't need tests because you don't introduce bugs. The most interesting part is that you think that other people that write unit tests are lazy. Very interesting position, I would say that whoever tries to refactor a big chunk of legacy code without adding an appropriate coverage is certainly lazy and worse overconfident and probably never did anything similar in the past.
The point number 5 proves that you never, ever, worked in a big project if you think that it's enough to look at the logs to find a bug. It's common knowledge nowadays that earlier a bug is caught the less effort it takes to fix it.
Finally I think that should be clear that the very bad and toxic approach to programming is not the one that involves writing unit tests.
This approach tries to formalise the requirements in a set of scenarios, describing each condition that must be satisfied for a given event/input.
Your approach seems focused on just writing code, without a proper design phase, without understanding completely the requirements with a close feedback cycle with the users and with a complete disregard of existing functionality that you think you can write better because you are a better programmer given that you don't need unit tests.
In my opinion not writing tests is laziness.
Muhahahaha! Sometimes it's almost impossible to understand a piece of code, especially if that code is old and has had comb-overs from a lot of developers. Unit tests at least might give some hints to what's happening.
Doing so forces you to define several constraint-based models for your program, and then layering mutation testing over the top helps you figure out how good your tests actually are.
Now they have a cost, and sometimes they are not worth it, but they are certainly not useless.
Unit tests seem mostly worthwhile as documentation or tutorial for how to use something or for what something is "supposed to" do. But as validation of completeness, correctness, or coverage of your design and implementation... they fall extremely short and leave a lot to be desired.
In either case, if software quality and reliability actually matters to your problem domain, then mutation testing should be considered an absolute prerequisite to any kind of testing method (unit, property-based, integration, etc.) otherwise you have almost no way to measure the quality of your tests.
The speed difference though, integration tests just take too long to be able to completely replace unit tests.
Fast enough to have it in a loop for by the book TDD? No but I'm not sure by the book TDD adds any value. You mash test once before committing and if all is good you go ahead.
I worked on a big ETL database migration tool in Python and it had a top level integration test with the smallest possible subset of data, like 500k end to end. Then each subsystem (it was architected like a nanopass compiler) had a more complex set of integration tests that only checked that pass but more rigorously. The whole project (10k lines of code) had under 10 unit tests that double checked tricky low level primitives. Every test was independent and could be run in parallel, my longest test was just under a minute, my shortest was about 1 sec. The shortest tests were run on module import ensuring both an easy affordance to testing and a tight feedback loop with no unit tests to impede refactoring.
Of course you can create unit tests with fake data, though IMO it's a bad practice. I think it's best to have unit tests working on real production-like data (in addition to have edge-casey-data tested as well).
Unit tests of complex code is often more complex and obfuscated than the code itself, hence its usefulness as documentation is very limited.
>3. No one dares to refactor anything ==> Code rottens ==> maintenance hell.
>4. Bugs are caught by costly human beings (often used to execute "system level tests") instead of pieces of code.
Lack of unit tests doesn't necessarily means lack of automated tests. Functional tests when applicable, are far more useful in my opinion than unit tests.
I've found that expressing unit tests in the language of the problem domain and never stressing implementation details wherever possible is the ONLY way to prevent this from happening. Avoid mocking and test doubles where possible, but sometimes they're the right tool. The tests should be treated as production code that gets boyscout rule'd as people go into it for whatever reason.
I agree, but a lot of people (at least in my environment) would play the "that's not an unit test" card, which is fine to me, we should care about the least intrusive way to automate code tests regardless of the philosophy behind it.
Yep. Integration tests under "top level system test" and above "tightly coupled unit tests" are not only excellent at catching bugs in integration code, they don't require hours of tedious mocking and don't have to be completely rewritten when you refactor the code base.
>No one dares to refactor anything
Is a problem on heavily unit tested code. Unit tested = tests that are tightly coupled to the code = refactoring turns the tests red whether or not anything was broken = people prefer leave the code alone.
That is, when you refactor the old pieces are still there (in the libraries), and their tests still work, but you are maybe not using them anymore higher up in the hierarchy.
The author went to some length to point out (correctly) that test automation is an orthogonal issue. Reading beyond the title, we see the issue is that an obsession with unit tests, and especially an obsession that is focused entirely on a measure as simplistic as code coverage, tends to result in a sub-optimal use of development resources (which are always limited.) A redundant or otherwise unhelpful test (of any sort) does not gain value by being run automatically.
It's been around a year since development has basically frozen on that project.
That functionality had to go somewhere, no matter how badly written it was.
Unit tests don't magically save you from code bloat.
Also I work on 3 code bases written by other people with zero unit tests, I refactor them all the time. Other developers have refactored them. Not having unit test is not a big barrier to refactoring.
No, but the idea is that such a method would be hard to test, considering it (presumably) does so much. So in order to test it properly, you are encouraged to break it up into smaller methods. (Whether somebody actually does that is a different story, of course...)
Demanding those unit tests might indeed force them to write shorter methods. It doesn’t forces them to write good code. By “good” I mean fit for particular project and requirements: I don’t believe there’re universal criteria for code quality.
The only way to fix that and make them better programmers — teach them, not force them to waste their time writing those unit tests.
Besides, writing and running system tests teaches the developers about other system and OS components — and they better know and understand (to some extent) your complete technology stack, not just the layer they’re currently working on. Writing and running unit tests teaches them nothing interesting; it’s boring and involves a lot of copy-pasting.
It's absolutely true that there's almost never a time when a 1500 line description of a business process is the optimal approach. Conversely though, if abstractions are done poorly, those same 1500 lines split into a maze of different classes that don't cleanly abstract a particular, human-comprehensible part of the process can be just as hard to deal with.
The initial people who came up with the idea thought about writing down the execution of a usecase, or a small part of that, as a test. Then they ran their code against it while developing it. That gave them insight into the the usecase as well as the API and the implementation. This insight could then be used to improve tests, API and implementation.
But most professionals aren't about making quality. They are about paying their rent. So when they started to learn unit tests, they just wrote their code as always, and then tried to write tests, no matter how weird or unreasonable, to increas the line coverage of their test suite. The proudest result for them is not to have a much more elegant implementation, but to find the weird test logic that moved them from 90% coverage to 91%.
I believe that's how you get a lot of clutter in your unit tests. However what is described in the document are sometimes example of people really trying, but that are just early in their development. Of course when you learn to do something by a new method you will first do crappy, inefficient stuff. The idea here is how much do you listen to feedback. If that team that broke their logic to get higher coverage learned that this was bad, then they probably adapted after some time, and then they did exactly what unit tests are there for.
But anybody who edited a 100 000 lines long project, ran the unit test, and saw red lines, fixed them, ran it again and then saw green now the feeling : it's great. You are way more confident.
This is the key insight: unit tests are about feelings more than anything else. People get so defensive in unit testing threads because eliminating unit tests means eliminating their source of confidence.
For example, I'm working on an internal project that creates VMs with some provider (be it Virtual Box, AWS, etc) and then deploys a user defined set of docker container to it. I've found that I don't have bugs in situations I would typically test using mocking/stubbing/etc in traditional unit tests. I usually need to have the real AWS service with the docker service running to get any value out of the test. And at that point it's more work to mock anything else than it is to just start up the app embedded and do functional testing that way.
I'm becoming more of a fan of verifying my code with some good functional tests in areas that feel like high risk and then some contract testing for APIs other apps consume. Then if I find myself breaking areas or manually testing areas often I fill those in with automated tests.
* Keep system level integration tests [for up to a year].
* Keep unit tests that test key algorithms for which there is a broad, formal, independent oracle of correctness, and for
which there is ascribable business value.
In my experience, an overarching test layering strategy works best in providing maximum coverage for minimum friction: Add tests bottom-up, at each layer testing the core functionality of that layer, using directly the layers beneath. The most valuable tests are the system-level tests, which, hopefully, exercise a large swath of the underlying codebase. This reduces to the author's recommendation for most cases.
Some people are proud to ditch UnitTesting[TM] to avoid the unit test cargo cult, in particular, the proliferation of "decoupled" tests via overuse of mocking libraries. There is very little value in "unit tests" that reproduce the code flow via mocking asserts, alas a lot of codebases that are heavy on unit testing degenerate into gratuitous mock fests, which become incredibly onerous to work with.
It should have been.
* Keep system level integration until requirement for system level integration change
* Keep unittests. And increase code coverage.
Any code that is not executed in a unittest will eventually break. Especially in interpreted languages.
* Obviously code coverage is not enough. But in my experience code coverage is minimum requirement. Not something to be happy if reached.
* The biz cases also need to be tested, which is usually done by a combination of unittest, integration system, system tests and end 2 end tests.
It adds value in the sense that if someone randomly types garbage into that file it will break, but it acts as a barrier to refactoring or business requirements change, as pretty much nothing but the exact implementation of the method will satisfy the test, and offers no documentation benefit over the code itself.
Nobody gets fired for writing too many tests.
It is a waste of time testing for a pre-condition. Ensuring a pre-condition is the responsibility of the caller and tests in that part of the code, or integration tests, should be responsible for identifying problems in that regard.
Like anything else, all things in moderation.
That's why I love the "dump results into a single large string and compare" style of testing. Declaring a given output as known good (until proven otherwise) is a bit of a gamble, because yes, you do think harder when you are writing individual asserts. But when something changes it is very easy to tell a bad, unexpected change in the output from a good one and replacing the previous known good copy with the new one is just a few keystrokes away. A comfortable anything-to-JSON tool and a good diff viewer should be part of any testing arsenal.
Every middle manager on the planet thinks their product/system/whatever is the most critical part of the business, and any errors are unrecoverable and must be avoided, regardless of cost.
People tend to forget that testing comes with cost, and pretending like there are zero benefits to not performing certain types of testing is just asking for your competitors to outpace you.
If you build your system the right way, bugs simply can't kill you.
It's also contradictory to say that "features" can be tested as a single unit, aka function or class.
Unit tests are not free as they are also code that much is obvious. Coplin however delves also into less obvious aspects of impact of unit tests on design and also the organizational aspects. Ultimately coding patterns are going to reflect the incentives that govern the system.
Software development is a lot about trade-offs. There is plenty to be learned here how to do it. A addendum by him can be found here: http://rbcs-us.com/documents/Segue.pdf but the meat is in the 2014 article.
it really brings out code smells: if you need mocks injected everywhere instead of being able to use dependency injection cleanly, it shows. if you have code paths that can only be triggered within events, it shows, etc.
having "wasteful" unit testing is more an investment for the future: when users came with real bugs, the ability of reproducing their step in code and fixing that in a no-regression suite is invaluable, but requires your app to be testable in the first place, lacking which you are stuck with manually testing stuff or even worse coughselenium
Code that is hard to test is poorly architected. Poorly architected means it's hard to test.
Example. Take this code (I'm reusing an example from this thread since I think it represents typical "well-factored" code, and isn't an obtuse example to prove a point):
raise "Can't save an invalid object"
def initialize(foo_validator, foo_factory)
@foo_validator = foo_validator
@foo_factory = foo_factory
raise "Invalid - can't save object"
The code is more extensible, for sure, but in a way that probably doesn't matter. I can pass in any validator I want! Amazing! Except, in 99% of cases, I'm going to have one way of validating something. The same goes for saving.
I would posit that at least 90% of the time that I see dependency injection in a codebase, it's there solely to aid testing and almost never adds practical (as in, actually being used and not just theoretical) value to a codebase.
Iow "if we take things to the extreme bad stuff will happen" contains its own solution.
Regarding "should we only exercise them from the UI level" - I'm not 100% sure what you're getting at - but if your point is that we should focus our testing on business-facing use cases and not trivialities of what class calls what method, then we're speaking past each other and are in complete agreement.
Regarding validation - depends on who you ask. Single Responsibility Principle taken to an extreme would probably support the idea of having a single class whose purpose is to validate an object.
Regardless of nitpicking though, the point is that dependency injection often makes it harder to reason about code (as large parts of the "business logic" of code are relegated to a dependency that isn't obvious to locate), and are often done strictly for the benefit of the tests, rather than the functionality or comprehensibility of the code.
Does it make more sense for a human to do all the aspects of the testing by hand? Of course not. Nobody has budget for that. It's much better to automate as much testing as possible so testers can focus in higher level tasks. Like the risk assessment involved in marking a build as releasable.
Then, unit testing encourages people to construct their software for verification. This software construction paradigm in itself is enough of a benefit even if unit tests are absent.
Construction for verification diminishes coupling, and encourages developers to separate deterministic logic from logic depending on unreliable processes that require error handling. Doing this frequently trains you to become a better developer.
Unreliable processes can be mocked and error handling can be tested in a deterministic way.
For example, I almost always do "gold master" testing when refactoring a large unit or module of code (test the big picture input / output given a few cases without regard for fine-grained tests within, refactor away as long as you can keep the tests green). It's an amazing way of refactoring as it acts almost like a safety harness - you have immediate feedback when you've done something wrong and changed the behaviour of the class. After the refactoring is done, however, those tests are almost useless, as they don't test the purpose of a class but just dumbly look at the input and output.
I think a lot of the tests done via TDD should be looked at in the same way.
It might make sense if you're working for a huge corporation with a LOT at stake. Unit tests then become a form of risk management - It forces employees to think REALLY LONG AND HARD about each tiny change that they make. It's good if the company doesn't trust their employees basically.
I MUCH prefer integration tests. I find that when you test a whole API/service end-to-end (covering all major use cases), you are more likely to uncover issues that you didn't think about, also, they're much easier to maintain because you don't have to update integration tests every time you rename a method of a class or refactor private parts of your code.
About the argument regarding using unit tests as a form of documentation engine; that makes sense but in this case you should keep your unit tests really lightweight - Only one test per method (no need to test unusual argument permutations) - At that point, I wouldn't even regard them as 'tests' anymore, but more like 'code-validated-documentation'; because their purpose then is not to uncover new issues, but rather to let you know when the documentation has become out of date.
I think if you're a small startup and you have smart people on your team (and they all understand the framework/language really well and they follow the same coding conventions), then you shouldn't even need unit tests or documentation - Devs should be able to read the code and figure it out. Maybe if a particular feature is high-risk, then you can add unit tests for that one, but you shouldn't need 100% unit test coverage for every single class in your app.
Which may very well be true! But I am amazed at the conclusion: That because tests are badly written, writing tests is a bad thing. No! Any code can be badly written, it doesn't mean that writing code is a bad thing. Tests, like any other piece of code, also need to be designed and implemented well. And this is something you need to learn and get experience with.
As to whether well-written unit tests are worth it, I cannot imagine how someone could efficiently maintain a codebase of any size without unit tests. Every little code change is a candidate to break the whole system without them, especially in dynamic languages.
Testing spoils the fun as now I need to write another piece of code for each, single thing that my original piece of code is doing.
I am no longer a wizard casting fireball into a room. I'm also the guy that has to go over the corpses and poke each one with a stick strong enough and for long enough to check if they are absolutely totally dead.
From time to time some of the people I lead will die due to my sloppiness but such is life. It's not like I'm leading royal family through this dungeon.
I like Haskell because I can skip most of the unit tests. Integration tests are still good, and some unit tests like "confirm that test data are equal under serialization and then deserialization" help with development speed. But I can usually refactor vast swathes of code all I want without having to worry about breaking anything.
If you do write unit tests and your test passes on the first try, make sure you change the output a little bit to ensure it fails. It's more common than you'd think to accidentally not run a test.
A lot of people seem to miss many of Kent's subtly, but intentionally phrased advice. Unit Tests are a liability, so use them responsibly and as little as possible, but not at the expense of removing confidence in your software.
Also, delete tests that aren't doing you any favors.
Unit tests are not albotrosses around the neck of your code, they are proof that the work that you just did is correct and you can move on. After that they become proof that any refactor of your code was correct, or if the test fails and doesn't make sense, that the expectations of your test were incorrect. When you go to connect things up after that, and they don't work, you can look at the tests to verify that at least the units of code are working properly.
I am no TDD fan, but I do believe that writing your code in a why that makes it easy to test generally also improves the API and design of the entire system. If it's unit testable, then it has decent separation of concerns, if not, then there may be something wrong (and yes this applies to all situations). I use this methodology for client/server interactions as well where I can run the client code in one thread and the server in another, with no sockets, to simulate their functioning together (thus abstracting out an entire area of potential fault that can be tested in isolation from network issues).
The article/paper raises good points about making sure that the tests are not just being written for the sake of code-coverage, but to say they are useless is just sloppy. Utilize the testing pyramid , if you adhere properly to that, everything about your system will be better.
I have a serious question, given that this was written by a consultant, is it possible that tests get in the way of completing a project in a timely manner, thus causing a conflict of interest in terms of testing?
 - http://martinfowler.com/bliki/TestPyramid.html
Don't confusing testing in general with unit testing. Just use the right tool for the job. If your unit tests aren't catching a material number of bugs compared the the effort spent, compared to other testing methods, then don't do them. Unit tests have benefits such as quicker execution time, etc. - but that has to be weighed against cost.
That is not always the case, but that happens in most of the large refactorings i've been involved in.
It doesn't invalidate the fact that unit tests increase your confidence in your own code.
As it happens, I am a Rust fanboy with all that entails, so I (like everyone else in such discussions) am clearly biased.
My point was in jest, as the “;-)” (because HN simply discarded a U+1F609) indicated.
Unit tests are also pretty quick to write once you have the mocking setup.
I've worked both on code where writing the tests was more effort than the code, and on code where writing the tests was easy, quick and helpful. The latter makes sense, after all a good test is straightline code, zero ifs, zero loops. But the former?
I think the key is that mocking should be used sparingly, but without hesitation.
The dirty little secret that nobody talks about is the cause of the ghosts: poorly engineered tests.
That's usually a sign that they've been engineered poorly or you have bugs in your code.
System level integration tests need appropriate environmental isolation and solid asynchronous & multithreaded code. Nobody can be bothered to write these properly for tests, hence the flakiness ("ooh let's just insert a sleep here" / "eh, does it really matter which version of postgres we run?").
Unit tests only can proof your software to be buggy. Even for the extremely simple "two, a function that, given no arguments, returns the number 2", your unit test can't verify that to be _always_ true. It may fail to return on Tuesdays (https://bugs.launchpad.net/ubuntu/+source/file/+bug/248619), internally use a web service that doesn't work on leap days (http://www.wired.com/insights/2012/02/leap-day-azure-outage/), other calls may overwrite the 'constant' it returns (http://programmers.stackexchange.com/questions/254799/ever-c...), etc.
That is true, I don't think anyone argues against that point. What unit tests do, is verify that under the code works under specific conditions (as defined in the test's set up). When unit tests fail under those specific conditions, then you know there is a problem.
Another thing I like about unit tests is that it is much easier (and faster) to test different combinations of conditions
Then, writing tests for the sake of code coverage, is like judging take a driving license exam and being judged on the streets you visited or how much gas you used. Tests need to be written with logic verification in mind.
I am deeply sad this article got upvoted. People should have a sense of responsibility when giving visibility to stuff like this.
I've heard the "executable documentation" line but my general impression is that people usually read the production code. They'll only look at the tests when they fail or they need to modify them.
I'm rather fond of this system, personally.
Did you actually read it? Because what you just wrote, is almost exactly the same as what the article author wrote.
His point, as I read it, was that unit tests can have purpose when you don't blindly make them for code coverage.
His case for having unit tests was:
> Keep unit tests that test key algorithms for which there is a broad, formal, independent oracle of correctness, and for which there is ascribable business value
But I don't think he would disagree with the case of providing some "executable documentation" as well. A few lines of code that demonstrate how to use a class or library, along with expected results, is definitely valueable. And those are exactly the kind of interfaces you don't expect to change even when refactoring.
> writing your code in a why that makes it easy to test generally also improves the API and design of the entire system
You can get the same result applying SOLID principles, without the cost of maintaining extra code. The presence of unit testing doesn't guarantees easy to test code either (and this is the worse case scenenario, bad designed code with a lot of test coverage that is broken in each refactor).
> is it possible that tests get in the way of completing a project in a timely manner?
Are you dismissing this as a point? Time is a resource and writing unit tests adds to that cost.
Unit testing needs to:
* Make you faster because there are less bugs in the long run.
* Prevent money loss caused by bugs compared to the money lost caused by the opportunity cost of writing unit tests instead of new features.
Also when writing unit tests developers are less prone to refactoring when requirements change as it will require to refactor also all the unit test affected.
You can agree or disagree with the points int he article, but it is NOT just "bullshit".
Unit tests verify logic, the compilers only guarantee syntax. Which yes, handles a huge set of issues that in a dynamic language would require 100% test coverage, you still need to check the logic of the code. Thus, unit tests are still a huge benefit.
One reason I love Rust is the build in #[test] feature. Builtin tests from the ground up, awesome.
Saying you're a Rust fan and then saying compilers only guarantee syntax when the entire point of that language is to have a constrained subset in which types guarantee no data races/leaks at compile time doesn't make sense to me.
Rust is the greatest thing since sliced bread, my comment was only about statically typed and compiled languages in general. I think you read more into my comment than I intended.
The only difference is that Rust's semantics is waaayyy more strict than C's.
I'm not saying that testing isn't needed in either place, but a lot of the times "enterprise" practices that leak into scripted environments make code far more complicated and difficult to understand than it needs to be.
Not every statically-typed language is Java. ;)
Though, I will say writing tests against and with a dynamic language tends to be far easier when things are written with modularity in mind.
Maybe you haven't seen it. But keep in mind that consultants get called in when something has gone wrong. They tend not to see clean codebases by teams that are working well.
It is not a guarantee of anything beyond that. However, having something close to automated deployment means that you should have very close to 100% coverage if only to ensure the above two things.
One thing I've rallied against at several points, more so in scripted environments is DI/IoC, it's often not needed, and there are often simpler solutions than what is generally used. The benefits are being able to have multiple targets for an interface, and being able to more cleanly unit test some systems in a strongly typed compiled language.
All of that said, I don't always go for thorough testing, but I try to write code in such a way that it's more modular, and would be easier to test, should the need arise.
The benefit for DI/IOC for me is mainly not having to worry about how to compose the system.
If some new module I'm writing needs a particular dependency, then I just ask for that in my constructor.
I neither need to know nor care how to set it up or initialise it. That's taken care of by the container - either automatically (convention based registration) or by the maintainer of FooService.
This makes it dead easy to create smaller integration/test harnesses: run the normal container build step, and then resolve an instance of my class, out it pops with all the dependencies handled.
Well... they should be. The number of Tests I've seen that are functionally useless is far too high.
Because what you test is much more important. I know I'm not imparting new wisdom here but if your test can't survive a refactor it's probably a) far too fragile b) poorly written and c) testing the wrong thing.
I actually would not be surprised if a large percentage of Unit Tests are useless, other than coverage stats, but I agree with you that this article is very full of hyperbole to the point of ruining any point it might have had.
That's not really what conflict of interest means, but it's certainly possible that his being in that position gives him a different set of priorities than, say, a project manager, and he's working backwards to come up with an argument that justifies his opinion.
Nope. They may be evidence, but they aren't proof. Proof comes from formal systems like type systems.
Doubt so, he works for a software testing consultancy. They've been around since 1994 apparently. http://rbcs-us.com/
The article is very reasonable, logical and a good counterpoint to the countless TDD anecdotes which obsessively focus on unit-tests to the detriment of any other types of testing or design practices.
But then the separation of concepts is driven by the testability, not the problem per se. It means that there's possibly a lot of complexity just for supporting testing.
* a little, if it passes a decent unit and/or integration test suite;
* a fair bit more, if it survives a round with humans who didn't write it but did set out to break it;
* quite a bit more, once it's been in production for a while without complaints tracing back to it;
* gradually approaching, but never reaching, perfect trust the longer it survives exposure to the world.
A lot of bugs will make it past step 1--even through tests with 100% code coverage. Hopefully no one takes them as anything near proof of correctness.
The drawback of this, is that the test suit grow fast, and need to me groomed pretty often, but it's a fair price to pay and in the end it's faster for me, and produce far less bugs.
IMHO, this is a failing of the type system. If you have functional purity, you don't have a need for unit tests, as they are blended into the code you write. Functional purity gives you the power to formally verify the correctness of your program.
or link bait
They informed me that they had written their tests in
such a way that they didn't have to change the tests when the
Why unit tests are good:
- You get well-tested parts that you can use in your integration tests, so that the integration tests truly catch the problesm that couldn't be caught at a lower level. This makes trouble-shooting easier.
- Decoupled design - one of the key advantages of TDD
- Rapid feedback. Not all integration tests can be run as quickly as unit tests.
- Easier to set up a specific context for the tests.
There are more details in the blog post I wrote as a response .
The debate about whether UT or system tests or something in the middle is better is missing the point. A test should be understandable at any level. 5+ mocks per test generally doesn't help the next guy understand what you are trying to test.
If you can abstract your system behind an API to drive and test it, you'll have much longer lasting tests that are more business focused and importantly are clearer for the next person to understand.
I can see great value in identifying the slow and rarely failing tests and running them after the quick / more information producing tests. Aee there any CI support for such things? I know TeamCity can run failing tests first...
I became a 'convert' after having to clean up a fairly large mess. Without first writing a bunch of test code there would have been no way whatsoever to re-factor the original code. That doesn't mean I'm a religious test writer and that there is 150% test code for each and every small program I write. But unit testing when done properly is certainly not wasteful, especially not in dynamic languages and in very low level functions. The sooner you break your code after making changes the quicker you can fix the bug and close the black box again. It's all about mental overhead and trust.
Unit tests are like the guardrails on the highway they allow you to drive faster confident that there is another layer that will catch you in case something goes wrong rather than that you'll end up in the abyss.
Yes, I've seen thousand line files of boilerplate unittests that don't actually say anything useful about the system. I've also written unit tests that tell me in 2 minutes rather than in 3 weeks that somebody has broken my code.
If your standard for a system of testing is that it guarantees that people can only write good code, you're insane.
The ideal case is that your codebase is entirely made up of code that never fails and tests that always pass. Obviously sometimes you are going to have tests that fail and introduce bugs that cause tests that used to pass to fail. But that's the reason that you write those tests, to find those problems.
The author gives the silly example of a method that always sets x to 5, and a test that calls it and makes sure x is now 5. That seems like a bad test but anyone who's actually done work as a developer understands why it isn't. If you skip the tests that are simple and straight forward and seem like a waste of time and only write more complicated tests then you will have a hard time reasoning what failed when the complicated test fails. Was your x = 5 method faulty? You don't think so but you don't have proof since it wasn't tested. Having the test, as silly as it seems, lets you know that method is working.
Anyone who has been on a team that skips easy/simple tests knows what a mistake it is. And if you don't, you will eventually.
A common side effect of writing these tests is discovering missing or conflicting cases. A recent example for me in eCommerce is our algorithm for determining if a particular product can be added to your cart. Things start simple (is it in stock?), but they get complicated really quickly. Is there a dropship vendor with stock? Do we expect to receive more stock in the next 24 hours? Is it discontinued? Clearance? Does it look like we'll run out in the next day and accidentally oversell the stock we have? (realtime inventory is a little fuzzy still).
When we unit test the module that handles all of this we don't care about internal state or exhaustive input testing, we care about whether the user sees "In Stock" or "Out of Stock" for the situations we encounter
I have written a very large amount of Java code in my career, but after having spent a lot of (personal) time on a Common Lisp project (web application) I can safely say it's still possible to build modern applications using a bottom-up approach. I recommend people try it, it can be quite refreshing.
As I mentioned in another reply, I intend to do some writeups on this stuff, but unfortunately it doesn't have as high priority as it should. But at least a video or two should be doable soon enough.
I have written a few blog posts about the architecture, but unfortunately not too much on the web part. I intend to write some more, and also make some videos showing how nice it is to develop a web application in the same process as the server is running.
Developer Response: Keep tests the way they are. Puff up shipping code by adding layers that just check and transpose arguments; enough of these layers and you get your 70% metric of code that's basically guaranteed not to have issues, and check in.
My response, trying to take that stuff and port it: Rip out about 75% of the junk code while cursing clueless management and developers who, ultimately, didn't write very good tests (none of the tests ran anyway, because they required an elaborate lab setup that involved undocumented APIs, installs of binaries meant for OSes that were nearly end-of-life, a SQL server and a bunch of other shit infrastructure. Beware test environment creep).
Any links related to it will be helpful.
> When I look at most unit tests — especially those
> written with JUnit — they are assertions in disguise.
I'd always assumed the point was to run these assertions at 'test-time', prior to distribution, and not have that code in the 'real' program.
Besides, most (?) of the time we probably want to fail more gracefully than that. (Okay we could `except AssertionError`, but typically it's going to be better to return something else, or raise and handle a more specific exception.)
That's the point of assertions, as they are usually not included in release mode builds in compiled languages. In Python you could just write an assert function that does nothing if some global "release" variable is set to true.
You only gain something from unit-tests if they test something that can't be equally well tested by writing the assertions into the regular code and running the software with typical input.
I much prefer this approach. The looser coupling gives you more freedom to undo your initial architectural mistakes.
You also can't effectively do unit test driven development on big balls of mud.
I dislike Gherkin-based languages though. The syntax design was not particularly well thought through.
> That means that tests have to be at least as computationally complex as code.
My BS sense is tingling. No matter how complex the code, in the end it comes down to comparing the output of a function (or state after execution) against what you expected.
Granted, OO regularly leads to design where unit testing goes straight to hell and after way to many lines of test-setup you essentially only test that your mocking framework works. But spare me these incorrect blanket statements – they don't help – thank you.
"... Large functions for which 80% coverage was
impossible were broken down into many small functions for
which 80% coverage was trivial. This raised the overall
corporate measure of maturity of its teams in one year, because
you will certainly get what you reward. Of course, this also
meant that functions no longer encapsulated algorithms. It was
no longer possible to reason about the execution context of a
line of code in terms of the lines that precede and follow it in
Unit tests which break code are stupid. Refactoring is good, but just splitting a large function into smaller pieces does nothing to improve the value of code unless it's done so that there is an understanding of the algorithm available and communicated.
Everything can be abused if not used craftly.
For blackbox mode, I am not that convinced that it is the proper strategy, especially when the product is being built incrementally. The typical example is when an entity state is being modified in a UC and the function to test the state is not yet developped. I'd prefer in that case to have the test verify that the state has been properly updated directly in DB.