Hacker News new | past | comments | ask | show | jobs | submit login

This is why I left Microsoft. Automated testing was a separate discipline - meaning there were "dev devs" and "test devs". Automated tests were written based on what the "test devs" had time for, not on the need or usefulness of such tests for the actual code. I was hired as a "test dev" - I had no industry experience at the time and figured I would give it an unprejudiced try to see if I liked it.

I quickly realized that my job was futile - many of the "good" tests had already been written, while in other places, "bad" tests were entrenched and the "test dev" had the job of manning the scanner to check for nail clippers, or upgrading the scanner to find as many nail clippers as possible.

Here's a useful rule on the subject that I picked up from an Artificial Intelligence course back in the day: The value of a piece of information is proportional to the chance that you will act on it times the benefit of acting on it. We all realize there is no benefit in testing if you ignore failures rather than acting to fix the bugs, but in much the same way that doing nothing when tests fail has no benefit, doing nothing when tests pass also has no benefit - so tests which always pass are just as useless as failing tests you ignore, as are tests which only turn up corner-case bugs that you would have been comfortable with shipping.

If you're doing the right amount of testing, there should be a good chance, whenever you kick off a test run, that your actions for the next hour will change depending on the results of the run. If you typically don't change your actions based on the information from the tests, then the effort spent to write tests gathering that information was wasted.

I disagree that there is no benefit in passing tests that don't change your behavior. Those tests are markers to prevent you from unknowingly doing something that should have changed your behavior. That is where the nuance enters: is this a marker I want to lay down or not? Some markers should be there; others absolutely should not and just introduce noise.

I don't understand what you mean by "prevent you from unknowingly doing something that should have changed your behavior". If you do something without knowing it, how could it change your behavior? If it's a case where you should have changed your behavior, why would you prevent it?

I believe the above comment refers to regression testing. For instance, if I write a test for an invariant that is fairly unlikely to change, then the chance that my behavior will change in the next hour based on the test run is small. However, if and when the invariant is mistakenly changed, even though negative side effects might not be immediately visible, it could be immensely valuable to me to see the flaw and restore that invariant.

Yes - but the test would fail when the invariant is mistakenly changed. On the test run after the invariant was changed, you would get new information (the test does not always pass) and change your behavior (revert the commit which altered the invariant).

That is the point of the "changing behavior" rule - you do not gather the benefit of running a test until it has failed at least once, and the benefit gathered is proportionate to the benefit of the action you take upon seeing the failure. The tricky part of the rule is that you must predict your actions in the future, since a test that might have a very important failure later could pass all the time right now. Knowing your own weaknesses and strengths is important, as is knowing the risks of your project.

There are possible design benefits to writing tests, since you must write code that is testable, and testable code tends to also be modular. However, once you have written testable code, you still gain those design benefits even if you never run your test suite, or even delete your tests entirely!

Your comment reads like you can know when a test will fail in the future (how else can you know the difference between a test that "always passes" and a test that will fail in the future to identify a regression?). You may have a test that passes for ten years. When do you know it's OK to nuke the test?

Based on your follow-up, it is clear that my reading was not what you intended.

You can't know, but you can guess, based on past experience or logic. The simplest way to estimate the future is to guess that it will be similar to the past.

For example, if you personally tend to write off-by-one errors a lot, it's a good idea to write tests which check that. On the other hand, if you almost never write off-by-one errors, you can skip those tests. If test is cheap to write, easy to investigate, and covers a piece of code that would cause catastrophic problems if it failed, it's worthwhile to write the test even if you can barely imagine a possible situation where it would fail - the degree of the cost matters as much as the degree of the benefit.

You don't "know" when it's OK to nuke a test just as you don't really "know" when it's safe to launch a product - you decide what you're doing based on experience, knowledge, and logic. The important step many don't take is developing the ability to distinguish between good tests and bad tests, rather than simply having an opinion on testing in general.

Re: "The simplest way to estimate the future is to guess that it will be similar to the past."

When we say that the future will be similar to the past, for code, we really mean that the probability of certain events occurring in the future will be similar to their prior probability of occurring in the past.

In my hypothetical example of testing an invariant that is unlikely to fail but damaging if it does, it might be valuable to keep that test around for five years even if it never fails. Imagine that the expected frequency of failure was initially <once per ten years>, and that the test hasn't failed after five years. If the expected frequency of failure, cost of failure, and gain from fixing a failure remain the same, we should keep the test even if it's never failed: the expected benefit is constant.

Not to say that we should test for every possible bug, but if something is important enough in the first place to test for it, and that doesn't change (as calculated by expected benefit minus expected cost of maintenance), we should keep the test whether or not it changes our behavior.

Thus, if we could estimate probabilities correctly, we really would know when it's OK to nuke a test.

Hey Evan ;)

> We all realize there is no benefit in testing if you ignore failures rather than acting to fix the bugs, but in much the same way that doing nothing when tests fail has no benefit, doing nothing when tests pass also has no benefit - so tests which always pass are just as useless as failing tests you ignore, as are tests which only turn up corner-case bugs that you would have been comfortable with shipping.

On the contrary, my opinion is that we don't see this often enough. Our bug database shows the current list of defects, and there is very very very little data on what does work. What is test covering, and how many defects are they finding within that coverage?

If your bug trend is heading downward, is it because the test org is distracted by something, or because there are fewer bugs that they are encountering?

Hey Jonathan ;)

This is the danger of having a 'test org' separate from the 'dev org'. When writing tests is tied to writing production code, your progress in one is tied to progress in the other. If it's easy to write tests for a particular feature, then the developer stops writing tests and writes the feature once they're done with the easy tests. It's much easier to understand your coverage when you're actually working with and writing the code being covered, rather than working in some separate "test org" - you don't need to run a coverage tool, you just see what breaks if you comment out a chunk of code. If the answer is "nothing" then it wasn't covered!

At the end of the day, an automated test suite is a product for developers on your project in the same way that a car is a product for drivers. You will have a hard time making a car if nobody on your car-making team can drive, and you will have a hard time writing a test suite as a tool to develop Project Foo if nobody on your test-writing team develops Project Foo.

I now write a project where I handle both the code and the tests. In the same way that my production code is improved by writing tests, my tests are improved by writing production code. I know what is included in the test coverage in the same way that I know what I had for lunch today - because I was there and I remember what happened. Tools for checking coverage are an aid to my understanding; in a company with a separate test org, you don't know anything about coverage until you've run the tool.

> This is the danger of having a 'test org' separate from the 'dev org'.

I completely agree with you, in light of how MSFT categorizes their ICs these days. There used to be three test/dev disciplines when I started (in 2001): "Software Test Engineers", "Software Development Engineers in Test", and "Software Development Engineers" (you likely know this already, but it might be useful history for others to know). Around 8-9 years ago the STE discipline disappeared; the company hired for SDETs (like yourself) from that point on.

The big loss here was that now all STEs were expected to be SDETs - writing code on a regular basis to test the application. There were (notice the past tense) many STEs who knew our application very* very well, and knew how to break it, hard. STEs provided quality feedback at a higher level than what a unit test provides - does the application feel right? Are customers getting their solutions solved? If you have your eyeballs stuck inside a unit test it's difficult to step 30 feet back and ask yourself if you're even doing the right thing.

Nowadays I feel like the pendulum is slowly swinging back the other way, and there is less drive to have test write automation (and maybe it is because where we are in the product cycle). I understand that high-level STEs "back in the day" were probably concerned about potential career progression at the time, which might be why they eliminated STEs (I have no idea of the real reasons - they've been lost to Outlook's mailbox size limits), but all-in-all I think MSFT is poorer because of it.

I worked on enterprise projects "at a similar company" with a few hundred "top notch" engineers. We had no more than 30% coverage, most of it from SDETs. No tests were being written by the devs before checking in for months, the test devs were understaffed, unqualified and thus behind. At one point someone made a checkin that prevented login from happening. Nobody noticed for a full work week, until that version made it into preproduction and someone finally attempted to login. Apparently hundreds of mils spent on the project can't buy you a team able to write above high-school level code.

I can see the usefulness of SDETs in doing system / end-to-end testing or testing of really obscure scenarios, but most of the test writing should belong to devs. I love the Rails approach to UT, functional and integration test split. The first time you try BDD, especially if you're coming from an after-the-fact testing culture like the one above, you almost want to cry from joy. I agree that Cucumber might a bit of an overkill, but perhaps I don't get it. For a non-prototypey project you should absolutely add other types horizontal testing like performance, security..

While I agree with this description of the value of information, I disagree with your interpretation of it in this context. Consider the following, rather extreme example: nuclear power stations are equipped with countless diagnostic systems with several levels of fallback. In a well-built and well-operated nuclear power station these systems will never signal failure during normal operation. This clearly doesn't mean that the output of these systems carries little value. Surely, a test that is always passing doesn't necessarily has no benefit, you also have to consider what it would mean if it suddenly stopped passing.

You are not designing nuclear power stations. You are at best designing a piece of business software.

(If you are designing a life-critical application, the scope of this article doesn't apply to you, as is stated in the article)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact