In the past while contracting I was usually asked to include in my proposals estimates for tests.
The tests failed to be useful, simply because they were written after the feature was actually implemented! We knew better, of course, but this was done in order to quickly get builds a client could look at.
Then when clients wanted to add/change features guess what got cut to make up for the time? Thats right, the tests!
So the tests were always secondary, and the projects tended to suffer as a result.
Recurring "fixed bugs" cost more than just working hours to fix.
In the eyes of a client or customer, they are so much worse than a shiny new bug.
Tests can help catch recurring bugs before a client/customer does - and save you not only time,
but from losing your customers confidence.
Now, I'm building my own app and I'm using a diciplined TDD approach.
I didn't start my project this way as It seemed overkill when it was just me. But I saw early on that to not practice TDD even solo was madness.
It is taking longer, but my actual progress is consistent, and I'm already far more confident about the stability of the app.
At two companies and across three teams where I've worked as a developer responsible for automation I have found that for integration and system type tests writing tests after feature implementation and properly reporting results following high frequency checkins has been allowed me to have a significant impact on the way we work to the point where some of the leads and management are actually taking testing more seriously than they used to. It's still hard to get rid of the idea that tests are secondary but they're not so eager to cut them as a matter of saving time as before after seeing how they allow us to identify problems faster, and therefore fix them faster.
In this article the author looks he's talking from the perspective of mostly a single developer or contractor but for bigger teams tests written after feature implementation are still important so I would say based on my experience don't discount them. That said, his approach to writing tests is something I agree with, rather than rigidly sticking to a particular methodology he is pragmatic and it's better than doing nothing.
Also he doesn't mention this but solid reporting is so important. I am involved with Tesults (https://www.tesults.com) which is a test results reporting application designed for teams of 10 or more to integrate into their build/test system. If you work for a team of this size and are serious about automated testing check it out. Send me an email if you're thinking of introducing it to your team and have any questions or feedback.
Yes, you can construct a regression test suite out of tests written after implementation, and this test suite will hopefully catch incorrect or incomplete alterations made to the software later.
However, the _reason_ it can do that is because these tests will have been written, ideally consciously _because_ of this underlying reason, with the original design of the software in mind; they will be essentially a written record of what the software was _intended_ to do and was designed to do. If one of these tests fails, it means that a design element has been incorrectly or incompletely altered; a well-written unit test of this sort will have an easy to follow flow which will, upon failure, clearly communicate to the programmer which original design element he has not correctly altered and under which specific conditions the error actually occurs.
Writing good tests after implementation can indeed encode a hugely significant amount of information. (Of course, it should also test intended behavior with respect to a wide range of both good or bad inputs, but that is beside the point I want to drive home here.)
Well enough written unit tests can be used as documentation of an application's design!
[N.B. The above thesis really only holds if the authors of the original code write unit tests, ideally as they develop while any design considerations are fresh in memory. As discussed in another comment, sending other engineers back later to try to figure out what the design was and then unit test their interpretation of they think it must have been will almost definitely not give anywhere near the same quality.]
That sounds really cool. Is there an open source project that you respect that has achieved this, that you can point me to? I'd like to look at it.
I find after-the-fact asserts, unit tests, and static analysis still useful:
- I catch incorrectly handled edge cases that I haven't hit "in the wild" yet (although others may have!)
- I think of new edge cases to handle
- I can be more aggressive in future refactoring
- I can be more confident in excluding the code from my bug hunts
Many of the issues I catch this way are the kind of issues that threaten to turn into really nasty heisenbugs - such as the occasional missing mutex lock. After all, all the low hanging fruit was probably caught when testing the feature locally ;)
Sometimes, some unit tests border on functional tests and vice versa, the distinction blurring between two adjacent test cases in a unit test suite.
And sometimes integration suites end up testing single functionalities at a time, depending on the design and/or requirements of the application, and could reasonably be called functional tests.
On my team, I hesitate to emphasize "functional tests" as a standalone group equivalent in semantic distinction to unit and integration tests; the proper scope of a "functional" test is defined, to me, by the way the functionalities of an application were designed and decoupled, which depends on the particular project.
(If your organization uses functional requirements and specifications for software design, it's simpler - a functional test verifies a functional requirement. Unfortunately, having functional requirements as a part of the engineering process is significantly rarer than it should be.)
Of course, writing the tests before the actual source code of the feature enforces the fact that it's testable, but it's mainly a side effect, there's nothing inherently better.
The microdesign flaws that stem in TDD (described in ) were real in my personal case, and my code is way better on the long term with the units test written afterwards, because I tend to modify my design a lot while writing code.
If you have a good design in mind before starting development, which is, after all, an ideal situation, it could be argued that it is reasonable to write tests after most of the development, not only to verify robustness to bad input as per the usual, but also as a way to encode a demonstration of the intent of the design.
Regarding solo TDD, it is slow going at first, but you quickly make up that time later on, when you add code, break your tests, and can quickly jump in and fix the issue. Saves your reputation too, I think.
This is a nice ideal but in practice can be really hard. For example, say I fix a grammar mistake in a text label (or, hard mode, a code comment). One could write a test, perhaps integrating a grammar checker, but this is a lot of work, for a low reward. So where do we draw the line?
It's easy to test things that have inputs and output representable as binary blobs. It's hard to test that this animation is smooth, that the build succeeds on obscure systems, that this graphic is rendered acceptably, that this event happens when the user connects that device. Or, rather, it's easy for a person to manually test any of these things, but hard to write an automated test for them.
A common failure mode is to capture the input and thereby isolate the system. If 95% of your bugs are due to interactions with your dependencies, isolating your system is going to find very little compared to a full integration test. This is a great way to make the tests pass, and also make the tests useless.
Maybe we conceptualize testing on two axes: automated testability and manual testability. If you try to write tests for components that have poor automated testability, you'll hit a valley of pain: spurious failures, excessive mocking of your dependencies, etc., that you can only climb out of with great effort (e.g. a robot inserts an HDMI cable). If these components have good manual testability, then it may be more cost effective to just test it manually.
As the article notes, writing all tests first is unreasonable because you won't know all implementation details until you have to make them; the tests I write first are thus functional tests, nowadays with cucumber.
Writing tests after coding is lacking, philosophically, because you often spend your time defining the abstractions and then just rewriting a verification of that abstraction in tests, plus some null checks.
The balance I've been using has been to write tests for abstractions I come up with, one my one. If an abstraction is decoupled and encapsulated, the unit tests come naturally. If i have to write a lot of mocks for an abstraction, that often tells me it isn't cleanly decoupled or simplified.
Furthermore, as you write tests as you go this way, you often find yourself writing the same support code more than once, at which point you notice it and find abstractions in that support code; this ends up explicitly giving you a view of what conscious and subconscious assumptions you have about what inputs you are expecting and what assumptions you have made. This is often enlightening.
This is good advice.
On a previous (technical-debt ridden) project I did a little measuring and there was a pretty clear hierarchy of test value - in terms of detected regressions:
1) Tests written to invoke bugs.
2) Tests written before implementing the feature which makes them pass.
3) Tests written to cover "surprise" features (i.e. features written by a previous team that I never noticed existed until they broke or I spotted evidence of them in the code).
4) Tests written after implementing the feature.
5) Tests written just for the sake of increasing coverage.
Surprisingly 5 actually ended up being counter-productive most of the time - those tests detected very few bugs but still had a maintenance and runtime overhead.
On all projects I own, the policy is that a bug fix will not be merged into a codebase without comprehensive unit testing demonstrating the case in which that bug was discovered, and that it has been resolved.
I do not understand why it matters _how_ the bug was discovered. If fuzz testing discovered that function foo throws tries to dereference a null pointer given input "ABABAB", then I would expect the engineer who chose to address that bug to investigate what property of "ABABAB" is the unaccounted for property, account for it, and then write a unit test calling foo with input "ABABAB", along with several other inputs that share the same discovered underlying property.
Fuzz testing may be a different method of testing, but the end result is, regardless, that you have discovered an input that your application hasn't been designed to handle properly and that needs to be demonstrably fixed, whatever it may be in particular.
I actually don't think that heavy fuzzing has a place in an automated test suite at all. Test suites should be fast and 100% reproducible at all times. Then explicit regression tests for the discovered cases are the only way. (I do occasionally allow myself to include short fuzz loops with fixed RNG initialization, but those are more on the "shameful secrets" end of the spectrum)
Put another way, those branches that are rarely touched or hard to get to can become a surprise when they are actually reached in some unique situation.
I guess in this case it isn't really the test itself that is useful but instead the requirement to at least hit each branch can cause me to design and organize the code better.
I once made the mistake, as a new lead, of implementing a 100% code coverage policy; I thought that I was expressing the intent of covering all possible behavior. What ended up happening is that the team focused on the metric and lost sight of the goal of unit testing, which is to test behavior. We ended up with people submitting PRs containing unit tests for object getters and setters but not testing that trying to set a null value is properly handled.
That experience taught me is that code coverage is only a tool, and a tool is only useful if used correctly.
To drive the point home: having 100% coverage of all branches means that all written code is tested, but it does not mean that all code that needs to be written has been written. Unit tests should verify behavior, not just execute whatever code has been written.
Sometimes my tests just start as a bunch of prints to see the results visually. Then when I'm happy with the results I convert these prints to assertions and the playground becomes a real test suite.
The underlying concept is that unit tests should verify behavior of minimal units of functional code. If you are running the application, you are testing much more high level functionality. "Running main" and "running unit tests" are completely different things, and I would rather the principle behind this difference be the takeaway, rather than just "avoid running main."
It's a motivation thing, and I think it can be very valuable tip how to motivate yourself to keep up your unit test suite (for those of us who need this behavioral trick).
> However, it’s also largely-wasted effort! A manual test only verifies the current state of the code base.
I don't think this is true, at least for GUIs, because manually using the UI validates that it works for a human. An automated GUI test does really only cover a tiny, tiny subset of the interaction between the human and the machine (yes, ideally "someone else would do it", but how many companies did you work for that actually has UX reviews and dedicated UX testers? Not to speak of open source).
Also and especially for larger applications I strive to get to the dogfooding point as fast as possible, because dogfooding provides invaluable feedback, not only for GUI and UX, but also for the sort of issues that tend to get overlooked in tests and reviews, eg. leaks in long-running processes, or structures that don't scale well. Dogfooding also tends to mean that the overall structure becomes relatively stable quickly, which makes writing a comprehensive test suite easier at that point, since you don't have to constantly work on both tests and implementation, but can concentrate on either.
Writing this reply made me think that I'm probably even less of a TDD person than I thought before.
If the effort of writing a few tests is smaller than actually running the application, the tests will be written because developers are lazy (instead of the other way around).
In the good old days of massive JBoss EE applications, this was taken care of automatically by the app server :)
Particularly enjoyed the emphasis on regressions. I converted to testing when working on a relatively complex data transformation. This was replacing an existing, scary data transformation process that was hard to test (we'd run new code for a few days and do a lot of manual examination), so I made extra certain to design the new system so it was testable. Catching regressions in test, especially for data processing, is just so much better than catching and repairing them in production.
> I’ve never really subscribed to any of the test-driven-development manifestos or practices that I’ve encountered.
I feel the exact opposite. I've worked in project with a lot of legacy code, both with BDD or with UTs that we added latter on.
Even with the best intentions the latter always failed: we always ended up doing a lot of unreadable tests that had no meaning and that we were afraid to look at.
However, when I was working in a team fully committed on BDD, we were looking at the tests before looking at the code, the tests were in the center of the developing process, and we were able to write fast, solid, and simple tests.
Nowadays, I'm more interested on articles that understand that tests can be a pain too. And tbh I don't really trust articles that aim at a high coverage without talking about the different challenges that comes with tests.
Which, to be fair, is a highly nontrivial task that will realistically never be completed as well as if the original authors had written unit tests demonstrating the intent of their design. And the comparison you should have been making is to that scenario.
I do wish the reasoning had been explained to me far earlier as I might have been able to really recognize the testing as useful and not just another strange requirement.
I do a mixture of traditional unit tests w/asserts along with visual feedback. I have a rig set up that will dump final (or intermediate) results of operations to the screen with several tests being presented at the same time.
If I'm actively developing something and am going to be spending a lot of time in the debugger I can solo a test. Having the additional visual feedback makes everything go a lot faster.
For higher level stuff having a number of tests on screen at once gives visual feedback about regressions, which again speeds things up a lot.
This combined approach is the most useful I've found so far.
After the fact unit tests that verify that abs(expectedImage - resultingImage) < e are also helpful. But unless you write some modeling application that has to produce exact results, a lot of code is typically about what looks good, not about what's mathematically correct. Those parts are probably impossible to properly test automatically.
Whatever computations you are making should be composed in such a way that distinct, reused computation types are implemented in units, perhaps conceptually one type of transformation per function. Each of these functions should be tested by a collection of unit tests which verify that it performs the correct computation on hard-coded, mock data, both good and bad, and that it fails when it should.
In cases where multiple transformations are applied together and there is more complex behavior, the application should be designed so that those combinations of "unit" behaviors are done for a specific purpose, with a specific goal (e.g. warp this image and then project its shadow); you should have an integration test suite that verifies this more complex behavior performs the function it is supposed to. Here, you can probably be less strict about checking inputs since bad inputs should be covered by unit tests, but you should verify that there is proper error handling of multiple units failing.
These should all run using mock data as part of your CI process, and no code should ever be merged that has failing checks; new code should not be merged without code review verifying that testing of comprehensive.
Finally, there should be a QA process that verifies the final product when being considered for release, using real data and manual validation, ideally with acceptance criteria in hand that was written before the code changes were begun.
In the end, it shouldn't even really come down to trying to mock your APIs; your abstractions should, if at all possible, be decoupled enough that they perform transformations of data that they are given, regardless of whether it came from another API or not - so instead of considering "mocking the APIs" you call the relevant functions with example data of the sort you receive from the APIs that will be actually used.
I suspect there's not much formal testing (at least done or required by Linus, some external projects may be available).
So it seems that testing isn't that necessary for a quality project? On the other hand Linux has a large community so maybe that substitutes for a comprehensive test suite?
Ideally, tests should take three steps:
- perform a single operation (the test)
- check operation result is the expected one
If a change in functionality if big enough to require change a lot of tests, this way makes obvious is changing the behaviour in lots of cases, something good to keep in mind. Most times a simple editor replace suffices, but shows up how big the change is.
Refactoring should be focused in making those steps easy to write and read...
I suppose this is another case where a common approach is given a name so that people can refer to it to each other and pass it onto others. I wouldn't have thought to name this one, though - I would rather teach a junior engineer how to think about what a test should do, instead of telling him to do "AAA".
With experience, I have seen the effect of something being given a name eventually leading to less experienced people treating it as a universal principle and wielding it as if all they have is a hammer, perhaps before having formed the habits of checking assumptions before making decisions, etc.
Allowing your application logic to do this for you creates all sorts of opportunity for hard to debug issues that have nothing to do with the thing you were actually testing. Putting data into the test DB directly makes for more setup, but makes for much more isolated test cases and much more certainty that your tests are doing what you think they're doing.
It also means you can write tests as you go without having to build a whole level of application logic to support your single test.
It is not a chore to unit test code, it is an extremely useful development tool that forces you to check your assumptions and verify your design as you go to give you a better product.
The attitude of this comment is along the lines of the sorts of attitudes towards testing it takes a lot of effort to un-learn junior engineers of.
However, everything has a cost, even abstractions. The more powerful the abstraction, the higher the cost. I find that for most of our projects, the sorts of abstractions unit testing forces on us is much more powerful - and thus higher cost - than necessary for the task at hand. If you are not taking advantage of that power, then all you really have left is the high cost.
Then some poor fellow ends up redesigning the test suite to actually follow the assumptions of a unit test's test data being only ever accessed by that unit test. I have been that fellow, and it can be, bluntly, quite a pain in the ass; as you're doing it, you can't help but think, "whoever wrote this was too lazy to set up the test data properly, I shouldn't have to spend the time to fix this now."
I'd go as far to say that attempts at DRY cause more problems in test suites than copy-paste coding, many of which never even get caught. In a test suite, this is very bad.
We've since gone more DAMP (http://stackoverflow.com/questions/6453235/what-does-damp-no...): tests should be descriptive and meaningful by themselves, not abstract, concise and DRY.
It makes for more lines of test code but they are simpler to understand and adapt.
This is why I commented elsewhere that DRY should be applied not as a coding principle to all code, but as a design principle to software design.
I won't meta-program for tests, but I will do things like make a list of classes, or symbols or whatever, to pass to a loop. Just keep it simple.
This is a case where metaprogramming for tests came in handy.
I had a bug recently that involved someone making a change that violated an invariant property of a class. To codify this invariant, I was tempted to do what you did, to make a list of symbols to feed into my test to ensure the invariant was obeyed. However this bug was caused precisely by someone adding a new symbol, a method, that didn't obey this property. The test using this design wouldn't catch this failure. I instead opted to do some introspection (it's Python, so it was dead simple) on the class to ensure all of its methods obeyed this invariant. It took a little extra time to implement but in the end it worked.
A test should be dead simple to read and understand when someone else new to the project needs to understand what it tests. Further, when a test fails, the output should clearly indicate exactly on what line of code an error occurred.
Metaprogramming tests feels to me like a case of a desire for or predilection for cleverness getting in the way of what the task is actually for.
Tests should tell a story. They should not be subjected to the same methods of abstraction used in the code they themselves are supposed to test and verify.
Likewise, although I view it as a deficiency of testing frameworks that require metaprogramming for parameterized testing. They shouldn't.
If you need a loop, ideally you write the loop using the language, rather than using a test framework's special loop construct. For parameterization, ideally you'd use a function, rather than a special test framework concept for parameterization.
Having duals of every abstraction feature of a language in the test framework increases the cognitive load and ramp up time to become effective in a codebase, and there's always the risk of incomplete knowledge, leading to mixing and matching different abstractions - and test framework abstractions are usually more leaky and less well thought out than language abstractions.
I came to this opinion by way of desiring to be clever in my tests and then finding myself having to insert `printf(input)` inside a loop in order to quickly figure out which input failed.
Code should be written so that it doesn't need to be modified. Feelings of aesthetics of desires to be clever should be avoided and are, ultimately, I think, often just vanity.
The same thing in something like junit is less workable, and copy and paste becomes more viable simply because the language is so clumsy and inexpressive. I've seen various extensions and annotations that let you parameterize junit tests but I don't think they make things much clearer.
A key benefit of driving the test cases off a table of inputs to outputs is that missing cases in cross products are much easier to spot. Visualising the state space and its coverage is easier when you can see a map of the covered space in one block of code.
The Don't Repeat Yourself principle, speaking from my internalized understanding without going back to wherever it was originally described and named as such, embodies the following general principles:
* A specific functionality should only be implemented in one place; implementing the same logic in multiple places creates the possibility of one or more of those implementations being modified without the rest being modified identically.
* The total amount of code that must be maintained should be minimized reasonably, such as by eliminating duplicate code.
* If the same functionality is being implemented times, that indicates that it may be an essential or otherwise important abstraction in the overall design of an application, and therefore should be abstracted.
That's the gist of it. Those principle apply to software design and architecture. Test code, however, is different in nature. It does not have abstraction and decoupling as a design goal; it is meant to test a software design in such a way as to be able to indicate specifically where an error is occurring. DRY actually works _against_ those goals - if you abstract away a test setup and make a mistake doing so, it is possible to, in the course of refactoring all relevant unit tests to use that abstraction, modify the unit tests to pass. If, in contrast, you keep setup functions entirely contained in each unit test class, you cannot possibly reach this situation, because you can only break one test at a time. The tests retain their function of testing specific source code and only breaking when that source code breaks.
DRY should not be applied without thinking to testing. It can be executed carefully and provide a benefit, but it is difficult to avoid the possibility of adding bugs to the abstractions you choose to make.
We should try to understand why we do the things we do instead of applying Named Methods because since they are named they must be absolute.
I am being hyperbolic for effect, but I do believe that this is a real effect, and I personally do hesitate to name things that don't really need a name.
I also find this a much more favorable approach than pure TDD. In my opinion, This method is easier to "sell" to other developers.
* you have already done it, or something sufficiently similar, before;
* you have formal functional requirements; or
* you have detailed use cases.
Personally, I've recently come to really favor use cases as part of the design process. The mindset behind them requires some effort to learn, but I find them an effective vehicle for categorically separating what an application must do from any consideration for how it might do it.
1) Make changes
2) Manually test & run automatic tests
3) Write automatic tests for each problem/bug discovered
This only works for decoupled code though. If all units are coupled you must have automatic tests of everything as no-one can comprehend exponential complexity.
The reason is simple: it tests your tests. I have many times found bugs in tests that made them always pass.
(The code was actually already structured for testing, I just hadn't written them because of that coverage number....)
I am still running main, by the way, but that's a different invocation called "system tests" which runs if unit tests pass (and after the coverage report).
Incidentally, OP, I empathize - I had to learn it the hard way too.
EDIT: This explains the concept, and gives a minimal approach to testing (i.e., you should test more than this, but at least this). Of course, there are tools to automate this, but not for every (new) language.
There are many other types of code coverage, including branch/conditional coverage, state coverage and so forth that provide much greater depth of coverage that developers should look into using where possible. The Wikipedia article has a good introduction: https://en.wikipedia.org/wiki/Code_coverage.
Edit: I mean that is there anyone in 2016 who uses testing tools that cannot instrument code to calculate coverage automatically?
I didn't say that. Be wary of people who say they fully tested your code :)
Anyway, there's another (non-waterproof) approach: try to trigger all possible paths through the code (instead of all basic blocks), but the problem is that the number of paths can increase exponentially with code size.
The code-coverage approach, in contrast, is very cost-effecitve. For example, roughly speaking, it triggers all possible exceptions that your code can throw.
100% code coverage can _never_ be taken, as a figure, to indicate that testing is comprehensive.
Well, unless you count uncaught exceptions from things your code calls.
There are other types of coverages. Like branch coverage, which is more like what you've described. And path coverage. Path coverage tests that all independent paths are executed at least once but it's tedious and memory intensive to calculate and he hence impractical.
* Code coverage's value: code coverage is not a goal in and of itself. Seeing 100% code coverage should not make you feel comfortable, as a statistic, that there is adequate testing. If you have 100% coverage of branching, you might have indeed verified that the written code functions as intended in response to at least some possible inputs, but you have not verified that all necessary tests have been written - indeed, you cannot know this from this simple metric. To give a concrete example: if I write one test that tests only a good input to a single function in which I have forgotten a necessary null check, I will have 100% code coverage of that function, but I will not have 100% behavioral coverage - which brings me to the following point.
* What to think about when unit testing a function, or how to conceptualize the purpose of a unit test: unit tests should test behavior of code, so simply writing a unit test that calls a function with good input and verifies that no error is not in the correct spirit of testing. Several unit tests should call the same function, each with various cases of good and bad input - null pointer, empty list, list of bogus values, list of good values, and so on. Some sets of similar inputs reasonably can be grouped into one bigger unit test, given that their assert statements are each on their own line so as to be easily identifiable from error output, but there should nevertheless be a set of unit tests that cover all possible inputs and desired behaviors.
* Unit test scope: A commenter I responded to in another thread had given criticism along the lines of that by making two unit tests which test cases A and B entirely independent, you fail to test the case "A and B". This is a misunderstanding of what the scope of a unit test should be in order to be a good unit test - which, incidentally goes along with misunderstanding the intent of a unit test. A unit test, conceptually, should check that the behavior of one piece of functional code under one specific condition is as intended or expected. The scope of a unit test should be the smallest scope a test can without being trivial; we write unit tests this way so that a code change later that introduces a bug will hopefully not only be caught, but be caught with the most specificity possible - test failures should the engineer a story along the lines of "_this_ code path behaved incorrectly when called with _this_ input, and the error occurs on _this_ line". More complex behavior, of the sort of "if A and B", is an integration test; integration tests are the tool that has been developed to verify more complex behavior. If you find yourself writing a unit test that is testing the interaction of multiple variables, you should pause to consider whether you should not move the code you are writing into an integration test, and write two new, smaller unit tests, each of which verifies behavior of each input independent of another.
* Applying DRY to test setup: if you abstract away test setups, you are working against the express intention of each unit test being able to catch one specific failure case, independently of other tests. Furthermore, you are introducing the possibility of systematic errors in your application in the _very possible_ case of inserting an error in the abstractions you have identified in your test setup! Furthermore, f you find yourself setting up the same test data in many places, that should not suggest to you to abstract away the test setup - to you, it should rather hint at what is likely a poor separation of concerns and/or insufficient decoupling in your software's design. If you are duplicating test code, check whether you have failed to apply the DRY principle in your application's code - don't try to apply it to the test code.
And, in my opinion, the most important and common misconception I see here, and I really feel that it should be more widely understood - and, in fact, that many problems with legacy code will likely largely stop occurring if this mindset becomes widespread:
* Why do we write unit tests?
We write unit tests to verify the behavior of written code with respect to various inputs, yes. But that is only the mechanics of writing unit tests, and I fear that that is what most people think is the sole function of unit tests; behind the mechanics of a method there should be a philosophy, and there is.
Unit tests actually serve a potentially (subjectively, I would say "perhaps almost always") far more vital purpose, in the long term: when an engineer writes unit tests to verify behavior of the code he has written, he is, in fact, writing down an explicit demonstration what he intended the program to _do_; that is, he is, in a way, leaving a record of the design goals and considerations of the software.
(Slight aside: in my opinion, being a good software engineer does _not_ mean you write a clever solution to a problem and move on forever; rather, it means that you decompose the problem into its simplest useful components and then use those components to implement a solution to the problem at hand whose structure is clear by design and is easy for others to read and understand. It further means (or should mean) that you then implement not only verification of the functionality you had in mind and its robustness to invalid inputs which you cannot guarantee will never arrive, but also implement in such a way that it indicates what your design considerations were but serves as a guard against a change that unknowingly contradicts these considerations as a result of a change made by someone else (or yourself!) at a later time.
Later, when the code must be revisited, altered, or fixed, such unit tests, if well-written, immediately communicate what the intended behavior of the code is, in a way that cannot be as clearly (or even necessarily, almost definitely not immediately) inferred from reading the source code.
In summary, these are the main points that stuck out to me in the conversations here; I do want to emphasize that the last point above is, in my opinion, the most glaring omission here, because it is an overall mindset rather than a particular consideration.
Its a good thing to take extra care writing the generating code, as any brittleness is passed onto dependent tests.