
The sad state of test frameworks - codr4life
https://github.com/codr4life/vicsydev/blob/master/test_frameworks.md
======
guitarbill
I have many bad experiences with DIY test frameworks. Please think twice about
doing this, especially for something others might use. The problems are
analogue to developing software without best practices.

At some point, that simple framework won't do what you need it to, so it
grows. And grows. Specificity and "clever"/"smart" frameworks are initially
awesome to avoid boilerplate. But the more specific you make it, the quicker
you hit these limits. The more clever you make it, the less comprehensible and
more brittle the tests are. Lack of documentation is always a problem, as is
predicting what such a framework needs to support. For the most part, it will
just end up an unmaintained mess that someone down the line will have to throw
out, and rewrite all the tests.

For acceptance testing, RobotFramework seems like overkill, and the syntax is
weird. But it's a super effective generic testing framework, and suits the
needs of thousands of projects. It's not perfect and you'll probably never
love it, but it'll get the job done, no matter how big the project.

I agree with the bad state of C unit test frameworks. We had to figure out how
to add unit testing to C code, and in the end we just went with Google Test
and all the headache that C++ brings with it. That was a while ago. After a
quick search, I think I'd give µnit a try first.

~~~
codr4life
We couldn't disagree more if we tried ;) I've had way worse experiences with
over-engineered, off-the-shelf, general-purpose solutions; there is plenty of
room for simplification. And there's nothing wrong with owning your code,
especially for something as fundamental as testing. The framework presented in
the post clocks in at 300 lines of C, there's not much to document really. I
would prefer spending my energy elsewhere, but uUnit and others lack what I
consider fundamental features and try too hard to steal the show.

~~~
guitarbill
Okay, concrete examples then. I checked out your libc4 repository, and tried
to build it. It doesn't even build, I get "error: conflicting types for
'c4suite_run'". (The code itself is a great mix of tabs and spaces with
trailing whitespace for days, which to me is a red flag that it wasn't
developed with attention to detail.)

The reason I was even trying to build the code is to get the output `./tests`
produces, so I could rant about how difficult it would be for me to integrate
this into e.g. Jenkins with decent feedback of which tests fail.

IMO, teamwork is essential, and I'd rather not work with somebody who is happy
to put the burden on other developers just so he can save a bit of time by
rolling his own. Thanks but no thanks.

~~~
codr4life
I prefer tabs, but it's not a religious thing for me. I don't get any errors
building the library, and I never saw that error; did you check the
signatures?

Jenkins integration is obviously not going to work as that was never the goal.
I think CI is mostly a cover up for failing culture, an excuse to waste more
time on ceremony.

I'm glad you got your rant.

------
vortico
For C and C++, what's wrong with using assert() in an integration_test()
function or many <func>_test() functions, called when passed a --test flag or
compiled as a separate binary?

For me, benchmarking isn't really useful unless it's a profiler in a real
world use situation. Otherwise you have no idea how often each API function is
called.

~~~
koja86
For C++ I find both Google Test and Boost Unit Test Framework pretty usable.
They both have some quirks but IMHO no show-stoppers. I prefer these to custom
solutions because don't want to maintain yet another tool but to focus on my
problem domain and it is nice when some fellow developers actually have
experience with these tools.

~~~
gpderetta
I prefer Boost.UnitTest to Google Test, which I also like. Boost.UnitTest has
very nice sugar, but the big problem is the very slow compilation times.

~~~
imron
That's why I like catch [0]

Super easy to use, header only so no need to compile anything beforehand, no
external dependencies, reasonably fast compilation time, and does everything I
need.

0:
[https://github.com/philsquared/Catch](https://github.com/philsquared/Catch)

~~~
to3m
Funny... I dumped Catch after a couple of days due to the terrible build times
;)

The fact it took over my programs' command line options and wouldn't give 'em
back didn't help either. (Many of my tests are data-driven and use command
line options to indicate where to read the data from - I already have a
library to handle this! I didn't see - and still don't - why a test framework
should be getting involved.)

~~~
braveo
I actually tried out Catch last night and dumped it almost immediately for the
exact same reason. I was blown away by how slow it was compared to the rest of
my (admittedly small) program in terms of build times.

I found using boost was a smaller hit to build times than Catch, and the
suggestion of building catch in a separate translation unit didn't really
help.

I'm guessing if your build times are already godawful you may not notice, but
I sure as shit did.

Ultimate I found and went with dessert:
[https://github.com/r-lyeh/dessert](https://github.com/r-lyeh/dessert)

It's stupidly simple and doesn't do a lot of things that other C++ testing
frameworks due, but it builds lightning quick and does 2 things I expect out
of testing frameworks.

1\. it tests, and 2\. it reports failures.

I'll probably run into severe limitations that I can't deal with/work around,
but for now I'm pretty happy with it and even if I end up using something else
in 6 months, I'll still consider it worthwhile.

------
alanfranzoni
Is this about the sad state of test frameworks or about the sad state of C
test frameworks? There's a reference to JUnit I don't understand. Junit
provides the ability to aggregate tests in suites (strict hierarchies wtf?).

Benchmark may be a different concern.. I know you can do something with rules
but it may require more time to set up.

By the way: creating a small, self developed tool that just does what you need
is sometimes a good idea, but it doesn't mean everything else is shit. Any
library serving a large user base will serve somebody better than others.

~~~
NickNameNick
I'm not sure if it's what the author was getting at, but you can't order test
execution in junit, so you can't chain together integration tests.

Of course, that's why you use other tools like TestNG. Unit tests aren't
_units_ if they're interdependent.

 _edit_ Looks like the author was trying to execute multiple overlapping test
suites. Haven't tried that.

~~~
smoyer
JUnit 5 [1], which is currently under development will include the ability to
control test order. The whole framework was redesigned with a consistent model
for extending almost all aspects of the system.

[1] [http://junit.org/junit5/](http://junit.org/junit5/)

~~~
jacques_chester
I looked at JUnit 5 recently and it's super super promising.

The website design does rather give the impression that it's done. I know that
if I read the text, I learn that it's not, but my brain categorises those
paras as marketing fluff and ignores them.

But it looks very much like what I want from Java testing. My views have,
admittedly, been warped by RSpec and Jasmine.

------
taneq
I use a roll-yer-own test framework that (I believe) strikes the perfect
balance between setup costs, simplicity and flexibility. It's a small utility
that just looks through a given directory tree, finds folders called */test,
compiles and runs each .cpp file in that folder (taking cues from formatted
comments inside the file if additional files are required), and compares the
output to a stored 'good' output for that file, flagging an error if the files
differ.

When developing code, you'll almost always end up writing some ad-hoc testing
code anyway: Generate some input data, call the code you're testing, printf()
some results. At development time, you read this output to verify your code's
working well. My system just makes this a bit more methodical and saves a
'known good' result - if the output changes then the code may be wrong. Once
you've fixed the issue (or determined that the new behaviour is in fact
correct) you update the known-good output.

Dead simple, basically maintains itself, still catches any problems that the
tests would catch.

~~~
boris
If you like this approach, you may like Testscript even better:
[https://build2.org/build2/doc/build2-testscript-
manual.xhtml](https://build2.org/build2/doc/build2-testscript-manual.xhtml)

------
zwischenzug
Interesting - I came to a similar conclusion from a completely different angle
and blogged about it the other day:

[https://zwischenzugs.wordpress.com/2017/03/18/clustered-
vm-t...](https://zwischenzugs.wordpress.com/2017/03/18/clustered-vm-testing-
how-to/)

I've similarly rolled my own bash-based solutions, as testing is really as
simple as running a script and getting a '0' or non-zero exit code.

------
keithnz
I rolled my own a while back
[https://github.com/keithn/seatest](https://github.com/keithn/seatest)

I kept it very vanilla C and avoided having to maintain a suite structure in
favour of a function call system which can be used on embedded systems without
much resources.

I should probably do more to maintain that project and improve it as I have
outstanding PRs it seems!

------
partycoder
I use Google Test, which integrates nicely with CLion and Jenkins.

------
ACow_Adonis
I haven't documented fully finished work on it yet (but its now running and
working for me), but apparently like everyone else I wasn't very impressed
with testing frameworks, so decided to attempt to roll my own for Common Lisp
rather than C:

See
[https://github.com/DJMelksham/testy/tree/master](https://github.com/DJMelksham/testy/tree/master)
if anyone is interested, though I'm not sure how useful it will be at this
stage.

Like the article author, I settled on tags for tests rather than explicit
hierarchies.

Also somewhat like the author, I care about benchmarks to a degree so amongst
other things, each test keeps its latest timing and results stats: and because
tests serialise to text/lisp code, each test-code/test-run can be version
controlled with a project.

My project has some slight diversions/additions to the article:

1\. Being able to capture previously evaluated code and returned results from
the REPL and automatically package up the form and the result as a test.
Obviously not as applicable to C, but i found myself interactively trying to
get a function to work then capturing the result upon success as a regression
test rather than the usual "design test up front" from TDD (although that's
possible too).

2\. I had a long long internal debate about the philosophy of fixtures. The
concept of environments/fixtures in which you could run a series of tests was
in fact my initial plan, and although I've backed out of it, I think I could
theoretically add such a thing back in, but I'm not sure I want to now...

It seems to me by supplying fixtures/environments in which you run multiple
tests, you gain "not paying the setup cost/don't repeat yourself" and
"flexibility of running tests in arbitrary environments". What I considered
the cost was the tendency to move away from true test independence (the
benefit of which is easier naive multi-threading of running the test suite),
and no longer a full referential transparency/documentation of a test. Test
2784 says it failed in the last run. Is that because I ran it in context A or
context B...and what else was running with it and what caused the failure?
What is the performance/timing actually measuring? Sort of like lexical
scoping, I wanted the reason for the test's values and failures to be as much
as possible "in the test source" and no where else.

This philosophy of mine creates obvious limitations: to try to get around them
a bit, while keeping the benefits, I made two more design choices.

3\. Composable tests: New tests can be defined as the combination of already
existing tests.

4\. Vectorise test functions!: Most important functions work on vectors or
sets of tests as well as individual tests. The function (run-tests), for
instance runs a vector of tests passed in to it. The functions (all-tests),
(failed-tests), (passed-tests), (get-tests-from-tags), etc, all return vectors
of tests you would expect from their names which can then be passed on to the
other functions that map across vectors of tests. (print-stats) for example
can print statistics on the entire test-suite (because by default its input is
the result of the function (all-tests)), but it also accepts any arbitrary set
of tests, be it passed-tests, failed-tests or a particular set marked by a
specific tag. And because each test is self-contained, all results/reports can
just be generated by just combining the internal results/reports of each
individual test in the vector. Copy a set of old tests as a basis for new
ones, compose multiple tests or functions into one, and/or map new settings or
properties across an arbitrary group of tests.

Anyway, I'm curious to hear other people's experiences and design choices in
this regard.

