

How (Much) Do You Test? - inventitech
http://www.codetrails.com/blog/test-analytics-testroots-watchdog

======
Morgawr
I don't honestly understand what is the point of this. How does time spent
writing something have to do with the quality of it?

Tests are supposed to be simple and straigthforward, I would hope to spend
less than 5% of my time actually writing tests. If I spend more then it means
that my tests are too complex to understand, read and write and that my whole
architecture is hard to test.

Are we trying to evaluate quality of code (yes, tests are code too) by the
amount of time spent writing it? Should we go back to assembly because it
takes us longer to write it properly hence it's more correct?

I honestly don't understand the point, enlighten me, please.

~~~
inventitech
The first step is just to observe how much time is spent on testing. We have
basically no clue about this. And our first study with students has also shown
that they had no clue about it.

In a second step, we can think about which implication testing time might have
on quality. As you said correctly, more time does not necessarily correlate
with better quality or higher productiveness. Maybe there is a certain range
of testing efforts that can be associated with good quality tests? E.g. if you
spend less than x on testing, your tests are likely to be bad. If you spend
more than y on testing, it might be worth investigating whether you have
unusually high testing targets. Or your tests might be extremely hard to
maintain.

I think the answer is Janus-faced and there is no single, simple answer (see
also "Testivus on Test Coverage",
[http://www.artima.com/weblogs/viewpost.jsp?thread=204677](http://www.artima.com/weblogs/viewpost.jsp?thread=204677)).

------
codingdave
This looks like a great example of "Be careful what you measure, because that
is exactly what you will improve."

~~~
inventitech
Yes, this is certainly true to a (I think) minimal extend. Why? Because
improving the time you spend on testing is something that is essentially hard
to do -- and only a meta-information, anyway.

An example: A quality assessment of your code tells you that your methods are
too long. This gives you a concrete task to do: You simply split all too-long
methods. However, with testing time, this is not so: Increasing the effort
spent on testing is something you cannot "just do" without a sensible plan. If
you want to increase this metric, you have to start to think what is wrong
with your current test strategy (maybe nothing's wrong at all), and come up
with an action plan of how to do.

If you're that far in the game already, I think this metric has done what it
can do. Plus, you'll likely see an increase in the metric. Which is justified.

Coming up with (critical) test efforts in the wild is our task now.

------
PaulHoule
I find myself bouncing between at least three modes.

There are some things that I really know how to code correctly out of my head
and in cases like that sometimes I put very little effort into testing.

If I am writing something greenfield that I don't completely understand (say
some algorithm inspired by binary search or any other place where off-by-ones
could eat you alive) then I would say there is little difference between
coding and testing; mostly I use tests and the Java debugger the same way that
Ruby or Python people use the REPL, except my tests get checked in at the end.

Another case is dealing with legacy code where I often end up writing tests to
document existing behaviors and prove that the system behaves correctly after
refactoring. I've seen many code bases that were awful (written by the kind of
people who struggle to have unique primary keys) but were salvageable because
they had a test suite.

Then on top of that there are a number of special cases too; for instance, if
you are doing something with threads that is complicated at all you probably
want to write a load balancer or if you're doing classic Map/Reduce you'd
better test your mappers and reducers thoroughly before you ever touch a
cluster.

------
i_ride_bart
Wow, the results are quite eye opening. I'm admittedly not the most "diligent"
when it comes to testing (I write enough to pass code review), but I would
definitely fall in the camp of overestimating the amount of time spent testing
by 3x

But I'm not quite sure if there is a direct correlation between time spent
writing tests and overall software quality?

~~~
inventitech
> But I'm not quite sure if there is a direct correlation between time spent
> writing tests and overall software quality?

Good point. We do not know, either. That is part of the reason why we do this
research. :)

As has been said before, the easier to understand and maintain your tests are,
the less amount of time you actually need to spend on them.

However, you could argue, that there _must_ be some minimal amount of time
that you absolutely need to spend on QA work like testing to ensure a certain
quality.

We are investigating this.

------
IgorPartola
I generally don't write unit tests, and very rarely write system tests. Partly
this is because of the type of stuff I deal with (Django applications), partly
it's because I generally have access to good QA people, but mostly it's
because the kinds of errors I see would not be caught by the kind of testing I
can write in any reasonable amount of time. Here are some real world recent
errors I've had to deal with, where testing wouldn't have saved me:

\- Poorly documented external API returns a redirect instead of a 404 in some
cases.

\- Missing <title> tag in a random HTML template.

\- Poorly assembled SSL certificate chain for an HTTPS service fails for some
browsers, but not all.

\- Celery tasks taking longer to process because there are now more things to
do as more users sign up.

\- Starting a DB transaction in the wrong place causes more failures than
necessary when only some of the modified rows cause constraint violations.

\- External API returns 0 sized file with no errors instead of the correct
response, depending on time of day and phase of the moon.

\- CSS issue rooted in poorly set up position and z-index properties causes
elements to be mis-aligned.

\- Missing clear: both;

\- Database server's disk filled up with log files from a rogue service.

\- Hosting provider gets DDoS'ed and their mitigation software starts
returning random site redirects.

\- Namecheap's DNS gets DDoS'ed and the site is inaccessible.

These are not the types of things that are easy to test, but very easy to
verify by hand or catch at runtime. Personally, I am much more in favor of
design-by-contract and logging and alerting of bad conditions at runtime.
That's not to say I wouldn't test a financial trading algorithm, or a binary
search library function. One of my projects (LogHog, a much easier to use
syslog-type-thing for Python), has lots of tests because it's got a very
simple and well-defined interface. It's almost library code and verifying its
correctness is both easy and useful at the same time. But not all projects are
like that, and sometimes easy testing is not actually useful and useful
testing is cost-prohibitive.

tl;dr: I can test the obvious stuff, but I can also just spend more time
writing it more carefully. It's the non-obvious stuff that breaks: disks fill
up, external API's misbehave, bad CSS, etc. You will never have "100% code
coverage" because you are not testing all the stuff that actually affects your
system, and your users don't care why the application is not working, they
just care that it's not.

~~~
inventitech
I generally agree with you.

"sometimes easy testing is not actually useful and useful testing is cost-
prohibitive."

While I agree here, too, I think that the situation where testing is not
useful happens very rarely in practice. Everything can break, and when you
think it cannot be possibly be wrong now, a future regression might occur.

I think badly written tests are a different problem. I.e. tests that for
example assert on the serialized string and hence take a lot of
maintainability effort to adapt to changing production code. But then there is
nothing wrong with the test per se, its just badly implemented.

~~~
IgorPartola
I can think of lots of cases where testing is not useful in practice.
Simulating random filesystem corruption when writing a Node.js app is pretty
much useless: you don't know which files will be corrupted, and testing would
take a huge (billions of lifetimes) amount of time to simulate all possible
combinations even for a small program. Yet, this can (and in my experience
did) take down a production system.

My other examples I already listed, I think, are examples of this too: you can
test all you want against a spec provided by your external API maintainer, but
if they don't follow the spec, you have a problem. We don't have a good
framework (and I don't think we can create one), for testing layout problems
in HTML that are caused by bad JS or CSS.

Basically, you do hit diminishing returns quickly, and you do get a false
sense of security by having lots of small unit tests that don't actually
prevent realistic failures.

