
Unit testing anti-patterns: Structural Inspection - ScottWRobinson
https://enterprisecraftsmanship.com/2016/07/21/unit-testing-anti-patterns-structural-inspection/
======
sudhirj
The reduction to absurdity is tests that essentially parse the code they're
testing and assert the AST.

Any unit test should really be from the point of view of checking what the
code _does_ , as opposed to checking what code was _written_.

Structural tests by definition don't check what the code does - they're way to
parse the code and check the AST is structured a certain way.

------
mwkaufma
Alternatively, don't write a runtime test for something the static type system
can encode directly. In this case, have three members for the subprocessors
with their concrete types, rather than a generic virtual-method-dispatch list.

------
slaymaker1907
Kind of specific, but I find that in the React world, Enzyme is worse than
useless for the same reasons as mentioned in this article. The tests you end
up writing just verify your XML (which generally has little to no value). Even
if there is some tricky logic which needs testing, you can pretty much always
just refactor your code a little bit and just use standard unit tests.

~~~
Zalastax
Manually writing Enzyme tests looks dumb to me but what about Jest snapshots?
Their only purpose is to detect changes to prevent accidental behaviour
changes. There's very little work that needs to be done to maintain those
tests so I want to think they provide value. An explicit git log of how the
rendered view has changed seems useful to me - especially in a pull request.

------
marcosdumay
Action anti-patterns: Shooting yourself on the foot. Avoid this.

The fact that this article is something that must be written, and it's clear
why it's here and why it was upvoted is depressing.

~~~
sametmax
There is nothing obvious in life.

I often feel like you do, so when I notice I go this way, I read
[https://xkcd.com/1053/](https://xkcd.com/1053/) again

~~~
petersellers
On top of that, even if this is something you might have seen before it helps
to put a name to a face, so to speak. Structural Inspection is a good name for
this anti-pattern.

------
mikekchar
Although I don't disagree that the testing of structure in the example given
is not particularly good practice, I disagree with the overall message.

There is a difference between "unit tests" and "acceptance tests". "Does this
test verify some business requirement" is a question aimed at the latter, not
the former. There is nothing wrong with writing automated acceptance tests
(and I highly encourage it), but they are not a replacement for unit tests.

What are unit tests for? They are for helping you reason about the effect of
changes on your code. There are two main questions a unit test should answer.
1: What happens when I do X (where X is calling some code)? 2: Does the
behaviour change if I modify the production code?

The first question is for helping you write the code in the first place (or
for modifying it subsequently). It is not a question of the application
operating properly, it's a question of the "unit" operating the way you
expect. You should be able to use this information to reason quickly about
what went wrong in the case that behaviour in the system changes unexpectedly:
"Unit a" should behave like X, and I can see that it does so by looking at the
tests. If that's the case, then "Unit y" must be at fault.

You can think of these kinds of tests as being sort of like predefined watch
points in your debugger. You should expose state in your units and add tests
so that you can understand the state of the unit if it doesn't behave as
expected. You should be thinking, "If the behaviour of this unit is
unexpected, what things would I want to look at when determining why".

You might be wondering (as many do) what a "unit" is. The answer is that it
can be anything: a function, a class, a global object, a group of
classes/objects acting in concert. It really doesn't matter. Some people like
to fix the idea of a "unit" as a class and also add "integration tests". I
have no problem with that, other than I have seen no real value in the
practice. Mostly I think people find it easier to envision how to name the
files that contain their tests.

Many styles of testing do a relatively good job of doing what I describe
above. However many people isolate their "units" too much. I think this is an
unfortunate consequence of the word "unit" which seems to imply "atomic unit"
to many people. The word (IMHO) refers more to simply an identifiable chunk of
code. Often people avoid using real objects as collaborators in their tests in
the mistaken belief that it will help with the specificity of their tests.

The second question, "Does the behaviour change if I modify the production
code" is as important as the first question. Imagine a scenario where you have
a series of either manual or automated acceptance tests. You run those tests
and you feel confident that the application is running as you expect. You then
modify some code. At this point, it would be very nice to be told what
behaviour (if anything) has changed in the application.

You might be forgiven for thinking that's the same as a regression test.
However, it is not necessary to actually write regression tests to give you
that information. We don't actually need to know whether or not the behaviour
is correct -- we only need to know if it is different (and ideally, what is
different).

As an example, I once worked on code that converted files to and from MS Word
file format. I didn't write any of the original code and it was pretty stinky,
TBH. Every time I made a change, I would likely break something because I
couldn't reason about the code. However, we had about 2000 documents that QA
had already tested and deemed acceptable. I wrote a quick hack that would
allow me to compare the "accepted" output for these documents with the new
outcomes. It didn't try to understand _what_ was wrong, just that it was
different. In fact, most of the time the changes were _good_ because I was
improving the conversions.

Obviously, the above is not a unit test system. However, the point is that we
don't need to write complicated tests to determine if the system is working
correctly. All we need is to have a known baseline for the system and then be
alerted when the behaviour changes -- either in a "good" or "bad" way. Then we
can have a quick look to see if the change in behaviour was what we expected.

In that way, the second criteria for good unit tests is that at least one unit
test "fails" when I change the behaviour of the system. A lot of people equate
TDD (Test Driven Development) with Test First Development. This is unfortunate
because it steers people off course. Test First is a good way to learn TDD,
but it's not by any stretch of the imagination the only way to do TDD. When I
have a piece of code with good unit tests, I will often modify production code
first and see what tests fail. If I have changed behaviour and no tests fail,
I know I'm in big trouble. Similarly, if 200 tests fail and I have no
specificity in my tests, I know I'm going to be in for a hard ride. These are
the kinds of things you should be doing to test the quality of your tests.

I'll finish this with one last thought. There are 2 classes of "customer" for
tests: the end user and the programmer. The user is concerned about whether or
not the application is working as desired. They don't care what is technically
wrong and they don't need to reason about how to change unexpected behaviour.
The programmer doesn't actually need to know what the correct behaviour is for
everything in the system. They need to have a known baseline of behaviour and
to be told when the behaviour has changed. They also need to know if the very
specific piece of code that they are currently working on is behaving as
expected. The customer for unit tests are the programmer, not the end user.
Having tests for the end user is a great idea, but doesn't actually help
tremendously with improving the development experience. The golden rule for
programming tests as a programmer: if it helps you reason about what you are
doing, then it's a good idea. If it doesn't, then you should avoid it.

~~~
krytenboot
This is a great comment. As a small addition to this, a good technique to
assess if your unit tests are able to catch behavior changes in an application
when the code changes is Mutation Testing.

This is a small explanation and demo I made recently with Stryker, a framework
for the JavaScript ecosystem [https://github.com/peter-evans/mutation-
testing](https://github.com/peter-evans/mutation-testing)

This is a good list of available frameworks in other languages:
[https://github.com/theofidry/awesome-mutation-
testing](https://github.com/theofidry/awesome-mutation-testing)

------
ThalesX
Hmm, maybe some sort of introduction of why one might try to unit test their
code structure would help some people like me. I don't think I've ever had the
though of unit testing my code structure.

What might I use it for?

~~~
adrianratnapala
I assume people fall into doing it without realising.

Unit tests are supposed to hermetically test individual components of a
system. In object-orient environments, those components are often classes. But
classes are typically quite small, very often a particular class exists to
some implementation detail of your overall system -- this is especially true
of the various Vistors, Factories, Adaptors etc. promoted by OOP theory.

When you combine "unit test every class" and "express your code structure
through classes" you accidentally end up with "unit test your structure".

~~~
projektir
This inevitably happens if your process insists on high test coverage.

------
mlthoughts2018
Structural inspection is sometimes a very good test idea, especially in a
dynamically typed language like Python.

For example, a certain class might get dynamically enhanced with extra methods
or extra structure at run time, depending on some config or something.

I’ve found many examples like this where needing to conform the design to a
static typing compiler would be a huge waste of time and involve some stupid
pile of abstraction that is totally inappropriate for the problem.

Instead, there may be 3 or 4 highly controlled “mix-in” options (where the
choice of mix-in happens at run time and is dynamically modified or
configurable), and it’s perfectly safe to write it with monkey-patching or
dynamically defining a class with type(...).

In the unit tests, you may sincerely want to verify that if the system is fed
config that should cause some new data members or functions to be
monkeypatched into some instances, that it does in fact cause this to happen.

I think the point is that in some languages, the structure of the program is
itself customizeable or modifiable and so there are functions whose unit of
work is to effect that modification, and those functions need to be tested
too.

Some examples might be checking that a certain class instance gets endowed
with a specialized context manager implementation or special data model
overrides or special iteration behavior.

I’ve experienced many cases where statically typed languages inherently
disallow this as a possibility, and you end up having to engineer huge
multiple-dispatch structures and highly genericized interfaces to get a
similar effect of dynamically selecting structure at run time, and often that
big pile of abstraction is hugely limiting and disallows you from efficiently
making big changes to it, even in cases when the programmer really can
guarantee safety and limits on complexity for their special use case.

~~~
mistrial9
extensive OO gymnastics is an acquired taste, and best practiced by someone
who is willing to deal with the dynamic mess that will inevitably result in a
mature code base. Nothing you say is wrong, yet __none __of this is necessary
to solve the vast majority of actual problems.

~~~
mlthoughts2018
I’d say the static typing analogue to the workflow I described is perhaps the
canonical OO gymnastics and leads to much worse messes, thoughtless
regurgitation of “design patterns” and wasted hours getting some tower of
abstraction in place for something that you can just do in dynamic languages
and make it perfectly safe and move on.

“The dynamic mess” is mostly a myth we tell ourselves because many dominant
languages are predicated on there needing to be actual material benefit to
type safety. In reality, bad use of abstraction will cause a mess no matter
which paradigm you use, but dynamic languages at least give you the option to
eschew the bullshit and just solve the problem, while many mainstream
statically typed languages, even functional languages, just give you pattern
recipes that _gaurantee_ you’ll get lost in a sea of value-destructive
abstraction and end up using a bazooka for every nail.

~~~
clhodapp
That hasn't been my experience at all. Indeed, I have gone found static typing
to give me superpowers by letting me quickly see what kinds of data and
transformations I have access to, constraining out possibilities that I'd
otherwise need to test for, and codegenning away boilerplate.

~~~
mlthoughts2018
A good example of why your experience doesn’t generalize is to compare the
severe limitations of breeze linear algebra for Scala with numpy in Python.

Breeze’s type safety creates this utterly needless hierarchy of abstraction
that severely handicaps you from doing advanced elementwise operations,
particularly with automatic broadcasting.

With ndarrays in numpy, you can pretty do whatever crazy beoadcasted
operations you want with no restriction on manually promoting things to “the
right” shape or type.

Breeze gives you “safety” at the cost of not being able to do anything with
it, and requiring huge pains to refactor generic code to add functionality. I
find this is exactly have it plays out with most large statically typed
systems, even with fancy type class patterns in FP.

------
lifeisstillgood
Overall, try to constantly ask yourself a question: does this test verify some
business requirement?

a pretty good guide. Analogous to "test the interfaces / contract"

------
KKKKkkkk1
_Overall, try to constantly ask yourself a question: does this test verify
some business requirement? If the answer is no, remove it. The most valuable
tests are always the tests that have at least some connection to the business
requirements your code base is ought to address. Not surprisingly, such tests
also fall into the formal definition I brought in the beginning of this
article: they are likely to give a protection against regression errors and
they are unlikely to turn red without a good reason._

Easier said than done. Let's say I'm writing a game AI and I wrote a test that
confirms that ogres can eat humans but cannot eat other ogres. Is this testing
a business requirement?

~~~
sudhirj
Yes that’s exactly what it is. If you business is selling a game that works,
ogres not eating other ogres is a requirement for your game to be considered
bug free and sell well. (YMMV, depending on who your audience is ogre cannibal
feat might also be fun).

This is as opposed to testing that that the eat method on the ogre only
accepts an instance of the Human class. If you do that you’ll have a brittle
system on your hands, and when your ogres decide to eat horses (business
decision) you’ll have to change a lot of tests and will instead procrastinate
by writing a blog post about why TDD sucks.

~~~
domlebo70
A good type system makes the 2nd change you proposed a joy to make.

~~~
sudhirj
Oh, absolutely. A good type system also makes unnecessary to write tests
asserting that your implementation uses exact / concrete types. A good type
system would allow you to have a clean and sane 'Eatable' concept (interface /
trait / protocol) that you can use and assert instead.

------
nevir
It helps to ask "what is most likely to change about the code I am testing?"
when writing unit tests. Try to avoid testing those things, and instead focus
on testing the code's behavior (quack)

~~~
hinkley
And then some pinhead starts cheerleading 100% code coverage. Usually long
before you’ve gotten your test quality high enough to be sustainable.

~~~
adrianratnapala
What I have trouble with is even gauging the quality of existing tests.

~~~
adrianN
Mutation testing is a good way to assess test quality. You ideally want every
mutation to be caught be a very small number of tests.

~~~
aiCeivi9
As far I can tell pi-test can't scale even to medium sized projects, time
complexity is close to (unit tests * classes). Are the other tools (maybe for
other languages) that deal with it better, without the need to split code in
smaller parts?

------
jmartrican
Either you KNOW you didn't introduce any new bugs or you don't know and are
relying on hope. Testing makes the difference. I really do not care how you
arrive to 100% certainty, just get there. If it means testing some structural
part of your code (and other unit-testing), or it means settings up
integration tests that do end to end tests on live systems, just get there. We
can talk about increasing efficiency in this process as a team. For some
systems it might be easier to do integration tests, for others its more
efficient to do unit-tests.

------
gt_
I’ve been learning testing the last few months (been programming around a
year) and this logic seems pretty apparent, and also fairly impossible to
adhere to.

More so, this idea of TDD seems a little ridiculous so far, and structural
testing is why, but I wonder if this is just becuse I’m going through a period
of wanting to refactor a lot while building something.

TDD is clearly appropriate for adding features, but when architecting central
components, I think it’s cost me a lot of time. Some of that time is writing
and verifying extra tests but most of it is reviewing a mountain of tests each
time I modify architecture.

Is this just a learning phase? Or is there a long-term lesson to learn here?

~~~
eropple
Testing basically breaks down into the following types (and you can flavor
them however you want, but take this as a starting point):

\- test-last testing to test the implementation you just wrote

\- test-first testing to test the implementation you're gonna write

\- test-driven design, where the tests reflect the specification and are
applied to the implementation

So, with these on hand, here's my litmus test for TDD:

Do I know what I'm building _and why_?

You can't meaningfully do test-driven design if you don't know the what _and_
the why. TDD tests need to encode "the business cares about this", and IME,
the questions you are asking often arise when somebody doesn't really yet know
what they're building. You're figuring it out. And that is okay, too!

I'm not a TDD devotee (in that I would love to write code in a TDD fashion
much more than I actually _do_ , but I think it's probably the right answer
for most situations where the unknowns have been solved or pushed out) and I
think that you're describing a very common situation where writing tests while
you're writing code, or maybe even after--though this is often fraught--is
totally fine. I regularly build systems by writing tests to exercise bits and
pieces of them in lieu of opening a REPL or running it through a command line;
the Test Cops aren't gonna come get you if this works better during your
exploratory phase. (It does mean your exploratory phase has some useful
artifacts when it's done, though.)

Put another way: if I have a spec, I can do TDD. If I don't, I can't, and I
don't lose sleep over it.

~~~
swish_bob
If you don't know what you're doing, how can you do it?

I don't see how this is a TDD thing or not ...

~~~
eropple
You've never written exploratory code before?

