

Even Tetris is hard to test - shalmanese
http://blog.jwhitham.org/2014/10/its-hard-to-test-software-even-simple.html

======
thu
The interesting bit that isn't mentioned is that all the given examples are
actually part of the game specification, i.e. those behaviours are there on
purpose. It means that if you have an accurate list of the desired features,
you could probably also achieve 100% coverage. It is also possible that
testing some features would not increase the code coverage.

"Hard to test" in the submission title didn't mean what I thought: I though it
meant it was hard to write tests for Tetris, not that it was hard to recover a
complete specification of the game while playing it.

~~~
Joeri
The corrolary is that you must have a specification in order to test
comprehensively. Either you use some form of TDD and your tests are your
specification (and any behavior not under test is undefined), or you have
exact and complete specifications that the as-built software can be compared
to, and any hehavior not in the specficiation is undefined.

------
fidotron
This is amusingly naive. I've seen the spec for Tetris, and it's surprisingly
big - much larger than the smallest known fully conforming code, which is
still surprisingly large. There are also a non tiny number of people knocking
around for whom it's their entire job to test it. "Even" Tetris? No, sorry -
that's nonsense.

The big problem here is that code coverage tests don't help you cover what you
should have explicitly defined or tested but didn't. As a result a lot of
things end up still defined by implementation and not specification, as all
sorts of important details only got defined during implementation.

------
mgraczyk
Nitpick: As an avid Tetris player, I have to say that the examples given for
"extremely rare" events actually happen quite often during normal play. In
particular, clearing 4 lines at once, two times in row will generally happen
within the first few minutes of game play and is sort of the whole goal (if
you're trying to maximize points). Wall kicks are less common but certainly
not rare enough that their processing code would be left uncovered after
playing a few games.

~~~
clebio
Are there more rare scenarios that you can think of?

~~~
shalmanese
There's a lot of special conditions around having almost the entire playing
field filled in.

------
userbinator
One way to reduce the number of possible paths is to reduce the number of
distinct cases, often via refactoring or a simpler yet more general algorithm;
to use an extremely contrived example, it's like the difference between

    
    
        int addOne(int x) {
          if(x == 0)
            return 1;
          else if(x == 1)
            return 2;
          ...
        }
    

and

    
    
        int addOne(int x) {
          return x + 1;
        }
    
    

I always keep in mind the famous Dijkstra quotes about testing and program
complexity:

"Program testing can be used to show the presence of bugs, but never to show
their absence!"

"Simplicity is prerequisite for reliability."

~~~
Too
This is a perfect example of why line/function coverage is a silly
measurement. It doesn't take into account global state and function input.
Take the second function above, you could get 100% line coverage in just one
run. But the function does _exactly_ the same thing as the one where you only
got 50% coverage, already things smell fishy. You can also test the function
with millions of different inputs, all of them giving the same 100% coverage,
and it will work perfectly fine, until suddenly you try addOne(int.max) and it
will fail.

What you want to be really sure is state coverage, or input range coverage
assuming your functions are pure. Now testing every function from int.min-
int.max might seem unrealistic but what you have to do then is constrain the
possible range of input or divide into ranges with special cases that you can
somehow group together. Say for example int.min, negative numbers, zero,
positive numbers and int.max.

Also, just because you covered a line doesn't mean it's correct, the only
thing you've really tested is that the program doesn't crash. For the test to
be really useful you also need a correct result of the output, added by a
human. You can't just randomize input to increase the coverage.

~~~
asveikau
I came to this thread hoping somebody would say pretty much exactly your first
paragraph. The author appears to be selling a code coverage tool so of course
he falls into the trap. In real life you have to remember that hitting a line
once is not the same as showing it to be correct. People who buy into code
coverage tools make this mistake a lot.

------
singingfish
Bah one line of BBC basic. One line of _incomprehensible_ BBC basic more like.
When I was at school I implemented a game in three lines of spectrum basic. 5
if you wanted scoring. And it was readable. And it was the most popular
computer game in the school by virtue of it being quicker to type than loading
a tape, and being moderately entertaining. Just bound the cursor keys to the
values in the line() function. The game crashed if the line went out of bounds
:).

------
zach
This is a great example of the value of test plans. This is basically
technology to reconstruct missing test plans via the code. But of course
someone already knew about the subtle "wall kick" feature since she or he
wrote that code. It shouldn't be this hard, with some effective communication.

And actually, especially in games, test plans are still poorly communicated.
In the old days, it was awful -- you would have the publisher doing all the
QA, and barely speaking to the development team apart from bug reports. QA
still often doesn't get involved until the last half of the project, before
which time nobody has been thinking much about testing.

As studios improve their production, this is getting better. As a programmer,
I've had more collaboration with QA as I work, at its best including having
our QA liaison talk out a test plan with me while I'm working the feature.
With enough communication, hopefully this kind of detective work to figure out
what to test can be avoided.

~~~
nanomage
Agreed. Dev's see the specs, and QA sees the holes. I get to participate from
the initial planning into release. As a QA person I feel very lucky.

In regards to the article, i would wager the better scores will be found by QA
than developers :-P

------
forrestthewoods
Interesting. I do wonder what the actual value of 100% coverage. I'm not
saying it's not substantial, I'm just curious how many cases that still
misses. There are a lot of permutations of data that can be used by code, how
many are possible and how many cause issues?

~~~
lucaspiller
One of the things I've found as a side effect of people using code coverage
tools is that instead of testing the behaviour of a method they end up testing
the implementation. I think this is because they initially test the behaviour,
but then see that one path is missing, so add a test to ensure that path is
ran - instead of just testing the behaviour that calls that path and checking
the code coverage tool. This ends up causing trouble if you ever want to
change the implementation as you end up having to throw away half the tests,
which means a lot of effort you spent to get 100% code coverage is now gone.

~~~
canadev
I've faced this problem in my own tests. I want to achieve total coverage so
that I know that I've got all the cases covered, but then I end up testing the
implementation rather than the contract. I'm not sure what to do about it.

~~~
aikah
You cant have it both ways in my opinion. High test coverage == testing every
possible paths == looking at implementation details. If you are testing an
algorithm(and games are full of these)you want it to be a 100% accurate
therefore you dont have much choice.

~~~
alkonaut
Tests should be based on the specification. If I want to change some internal
implementation detail I should only have to verify that the current tests
pass.

If a e.g game contains a sorting in some place in the renderer, I can replace
the quicksort with a mergesort as long as the renderer interface is still
testing ok. The new sort algorithm may have new special case paths (even
number of items vs odd for example) but it's not a concern of the renderer
public interface. I may however have introduced a bug with an odd number of
items here and the old code was 100% covered and now it isn't. So there is a
potential problem and the 99% has actually helped spot it.

If the sorting is a private implementation detail of the renderer then there
is no other place to test it than to add a new test to the renderer component
only because the sorting algo requires it for a code path. This is BAD.

The proper action here is NOT to add tests to the renderer component to test
the sorting code path, but instead to make the sorting visible and testable in
isolation via its own public interface.

So one of the positive things about requiring coverage is that if you do it
right, it will lead to smaller and more decoupled modules of code.

The bad thing is that if you do it wrong you will have your God classes and a
bunch of tests coupled tightly to them.

------
clebio
Nice to see a context and link to code specifications in a high-risk and
highly-regulated industry. 'Work slowly and don't break anything' would make a
good poster.

([http://en.wikipedia.org/wiki/DO-178B](http://en.wikipedia.org/wiki/DO-178B))

------
ajuc
Meanwhile BBC Basic one-line version has 100% code coverage the moment you
start it :)

~~~
davidrusu
well, if your code coverage scope is one line that's true, but it would make
more sense to scope on statements.

------
nathancahill
TL;DR: Use a coverage tool for tests.

