I quickly realized that my job was futile - many of the "good" tests had already been written, while in other places, "bad" tests were entrenched and the "test dev" had the job of manning the scanner to check for nail clippers, or upgrading the scanner to find as many nail clippers as possible.
Here's a useful rule on the subject that I picked up from an Artificial Intelligence course back in the day: The value of a piece of information is proportional to the chance that you will act on it times the benefit of acting on it. We all realize there is no benefit in testing if you ignore failures rather than acting to fix the bugs, but in much the same way that doing nothing when tests fail has no benefit, doing nothing when tests pass also has no benefit - so tests which always pass are just as useless as failing tests you ignore, as are tests which only turn up corner-case bugs that you would have been comfortable with shipping.
If you're doing the right amount of testing, there should be a good chance, whenever you kick off a test run, that your actions for the next hour will change depending on the results of the run. If you typically don't change your actions based on the information from the tests, then the effort spent to write tests gathering that information was wasted.
That is the point of the "changing behavior" rule - you do not gather the benefit of running a test until it has failed at least once, and the benefit gathered is proportionate to the benefit of the action you take upon seeing the failure. The tricky part of the rule is that you must predict your actions in the future, since a test that might have a very important failure later could pass all the time right now. Knowing your own weaknesses and strengths is important, as is knowing the risks of your project.
There are possible design benefits to writing tests, since you must write code that is testable, and testable code tends to also be modular. However, once you have written testable code, you still gain those design benefits even if you never run your test suite, or even delete your tests entirely!
Based on your follow-up, it is clear that my reading was not what you intended.
For example, if you personally tend to write off-by-one errors a lot, it's a good idea to write tests which check that. On the other hand, if you almost never write off-by-one errors, you can skip those tests. If test is cheap to write, easy to investigate, and covers a piece of code that would cause catastrophic problems if it failed, it's worthwhile to write the test even if you can barely imagine a possible situation where it would fail - the degree of the cost matters as much as the degree of the benefit.
You don't "know" when it's OK to nuke a test just as you don't really "know" when it's safe to launch a product - you decide what you're doing based on experience, knowledge, and logic. The important step many don't take is developing the ability to distinguish between good tests and bad tests, rather than simply having an opinion on testing in general.
When we say that the future will be similar to the past, for code, we really mean that the probability of certain events occurring in the future will be similar to their prior probability of occurring in the past.
In my hypothetical example of testing an invariant that is unlikely to fail but damaging if it does, it might be valuable to keep that test around for five years even if it never fails. Imagine that the expected frequency of failure was initially <once per ten years>, and that the test hasn't failed after five years. If the expected frequency of failure, cost of failure, and gain from fixing a failure remain the same, we should keep the test even if it's never failed: the expected benefit is constant.
Not to say that we should test for every possible bug, but if something is important enough in the first place to test for it, and that doesn't change (as calculated by expected benefit minus expected cost of maintenance), we should keep the test whether or not it changes our behavior.
Thus, if we could estimate probabilities correctly, we really would know when it's OK to nuke a test.
> We all realize there is no benefit in testing if you ignore failures rather than acting to fix the bugs, but in much the same way that doing nothing when tests fail has no benefit, doing nothing when tests pass also has no benefit - so tests which always pass are just as useless as failing tests you ignore, as are tests which only turn up corner-case bugs that you would have been comfortable with shipping.
On the contrary, my opinion is that we don't see this often enough. Our bug database shows the current list of defects, and there is very very very little data on what does work. What is test covering, and how many defects are they finding within that coverage?
If your bug trend is heading downward, is it because the test org is distracted by something, or because there are fewer bugs that they are encountering?
This is the danger of having a 'test org' separate from the 'dev org'. When writing tests is tied to writing production code, your progress in one is tied to progress in the other. If it's easy to write tests for a particular feature, then the developer stops writing tests and writes the feature once they're done with the easy tests. It's much easier to understand your coverage when you're actually working with and writing the code being covered, rather than working in some separate "test org" - you don't need to run a coverage tool, you just see what breaks if you comment out a chunk of code. If the answer is "nothing" then it wasn't covered!
At the end of the day, an automated test suite is a product for developers on your project in the same way that a car is a product for drivers. You will have a hard time making a car if nobody on your car-making team can drive, and you will have a hard time writing a test suite as a tool to develop Project Foo if nobody on your test-writing team develops Project Foo.
I now write a project where I handle both the code and the tests. In the same way that my production code is improved by writing tests, my tests are improved by writing production code. I know what is included in the test coverage in the same way that I know what I had for lunch today - because I was there and I remember what happened. Tools for checking coverage are an aid to my understanding; in a company with a separate test org, you don't know anything about coverage until you've run the tool.
I completely agree with you, in light of how MSFT categorizes their ICs these days. There used to be three test/dev disciplines when I started (in 2001): "Software Test Engineers", "Software Development Engineers in Test", and "Software Development Engineers" (you likely know this already, but it might be useful history for others to know). Around 8-9 years ago the STE discipline disappeared; the company hired for SDETs (like yourself) from that point on.
The big loss here was that now all STEs were expected to be SDETs - writing code on a regular basis to test the application. There were (notice the past tense) many STEs who knew our application very* very well, and knew how to break it, hard. STEs provided quality feedback at a higher level than what a unit test provides - does the application feel right? Are customers getting their solutions solved? If you have your eyeballs stuck inside a unit test it's difficult to step 30 feet back and ask yourself if you're even doing the right thing.
Nowadays I feel like the pendulum is slowly swinging back the other way, and there is less drive to have test write automation (and maybe it is because where we are in the product cycle). I understand that high-level STEs "back in the day" were probably concerned about potential career progression at the time, which might be why they eliminated STEs (I have no idea of the real reasons - they've been lost to Outlook's mailbox size limits), but all-in-all I think MSFT is poorer because of it.
I can see the usefulness of SDETs in doing system / end-to-end testing or testing of really obscure scenarios, but most of the test writing should belong to devs. I love the Rails approach to UT, functional and integration test split. The first time you try BDD, especially if you're coming from an after-the-fact testing culture like the one above, you almost want to cry from joy. I agree that Cucumber might a bit of an overkill, but perhaps I don't get it. For a non-prototypey project you should absolutely add other types horizontal testing like performance, security..
(If you are designing a life-critical application, the scope of this article doesn't apply to you, as is stated in the article)