Hacker News new | comments | show | ask | jobs | submit login
The sad state of test frameworks (github.com)
59 points by codr4life 35 days ago | hide | past | web | 33 comments | favorite

I have many bad experiences with DIY test frameworks. Please think twice about doing this, especially for something others might use. The problems are analogue to developing software without best practices.

At some point, that simple framework won't do what you need it to, so it grows. And grows. Specificity and "clever"/"smart" frameworks are initially awesome to avoid boilerplate. But the more specific you make it, the quicker you hit these limits. The more clever you make it, the less comprehensible and more brittle the tests are. Lack of documentation is always a problem, as is predicting what such a framework needs to support. For the most part, it will just end up an unmaintained mess that someone down the line will have to throw out, and rewrite all the tests.

For acceptance testing, RobotFramework seems like overkill, and the syntax is weird. But it's a super effective generic testing framework, and suits the needs of thousands of projects. It's not perfect and you'll probably never love it, but it'll get the job done, no matter how big the project.

I agree with the bad state of C unit test frameworks. We had to figure out how to add unit testing to C code, and in the end we just went with Google Test and all the headache that C++ brings with it. That was a while ago. After a quick search, I think I'd give ┬Ánit a try first.

We couldn't disagree more if we tried ;) I've had way worse experiences with over-engineered, off-the-shelf, general-purpose solutions; there is plenty of room for simplification. And there's nothing wrong with owning your code, especially for something as fundamental as testing. The framework presented in the post clocks in at 300 lines of C, there's not much to document really. I would prefer spending my energy elsewhere, but uUnit and others lack what I consider fundamental features and try too hard to steal the show.

Okay, concrete examples then. I checked out your libc4 repository, and tried to build it. It doesn't even build, I get "error: conflicting types for 'c4suite_run'". (The code itself is a great mix of tabs and spaces with trailing whitespace for days, which to me is a red flag that it wasn't developed with attention to detail.)

The reason I was even trying to build the code is to get the output `./tests` produces, so I could rant about how difficult it would be for me to integrate this into e.g. Jenkins with decent feedback of which tests fail.

IMO, teamwork is essential, and I'd rather not work with somebody who is happy to put the burden on other developers just so he can save a bit of time by rolling his own. Thanks but no thanks.

I prefer tabs, but it's not a religious thing for me. I don't get any errors building the library, and I never saw that error; did you check the signatures?

Jenkins integration is obviously not going to work as that was never the goal. I think CI is mostly a cover up for failing culture, an excuse to waste more time on ceremony.

I'm glad you got your rant.

For C and C++, what's wrong with using assert() in an integration_test() function or many <func>_test() functions, called when passed a --test flag or compiled as a separate binary?

For me, benchmarking isn't really useful unless it's a profiler in a real world use situation. Otherwise you have no idea how often each API function is called.

For C++ I find both Google Test and Boost Unit Test Framework pretty usable. They both have some quirks but IMHO no show-stoppers. I prefer these to custom solutions because don't want to maintain yet another tool but to focus on my problem domain and it is nice when some fellow developers actually have experience with these tools.

The posted approach clocks in at 300 lines of C, and comes with the freedom to add whatever features you want. Testing is as fundamental as it gets; and unless you're aiming to write the One True Testing Framework, it's at most a days work to get one up and running. Google Test and Boost Unit try to hard to be everything for everyone, which makes them too complex; while failing to add anything that was missing from plain functions and asserts.

I prefer Boost.UnitTest to Google Test, which I also like. Boost.UnitTest has very nice sugar, but the big problem is the very slow compilation times.

That's why I like catch [0]

Super easy to use, header only so no need to compile anything beforehand, no external dependencies, reasonably fast compilation time, and does everything I need.

0: https://github.com/philsquared/Catch

Funny... I dumped Catch after a couple of days due to the terrible build times ;)

The fact it took over my programs' command line options and wouldn't give 'em back didn't help either. (Many of my tests are data-driven and use command line options to indicate where to read the data from - I already have a library to handle this! I didn't see - and still don't - why a test framework should be getting involved.)

I actually tried out Catch last night and dumped it almost immediately for the exact same reason. I was blown away by how slow it was compared to the rest of my (admittedly small) program in terms of build times.

I found using boost was a smaller hit to build times than Catch, and the suggestion of building catch in a separate translation unit didn't really help.

I'm guessing if your build times are already godawful you may not notice, but I sure as shit did.

Ultimate I found and went with dessert: https://github.com/r-lyeh/dessert

It's stupidly simple and doesn't do a lot of things that other C++ testing frameworks due, but it builds lightning quick and does 2 things I expect out of testing frameworks.

1. it tests, and 2. it reports failures.

I'll probably run into severe limitations that I can't deal with/work around, but for now I'm pretty happy with it and even if I end up using something else in 6 months, I'll still consider it worthwhile.

I think thats the main rub, think about the maintainers. Using a bespoke test framework, is often a way of introducing conflict, no matter how good a coder you are. As Jeff Atwood says, imagine the maintainer being a psychopath.. https://blog.codinghorror.com/coding-for-violent-psychopaths...

Would you rather have your psychopath stalkers tearing their hair out over the Boost dependency you dragged in? Once written it stays mostly the same, there is not much to maintain. I can't see why rolling your own framework would introduce conflicts; unless you're in an environment where writing code to solve problems is frowned upon, but then you have worse problems.

Nothing. That's the way I usually test my C code.

It has the drawback that one failing test aborts the whole program. The error messages are not the most informative, but it does the job.

Although hacky,

    assert(test_foo() /* Checks the sanity of Foo */);
does the job for the error messages, and replacing abort() with a custom function like `void check(int condition, const char *format, ...);` can make it not abort the program on an error.

I prefer the regular assert() though because I want the build to fail by an aborting test program, and all I care is the line number of failure. Another benefit is that abort() will pause gdb in the exact place that it crashed, with the stack intact if you want to inspect it.

You can also use the comma operator to document your assert()s, if you don't like using comments.

    assert(("test_foo() should be true", test_foo()));

I have used often and also seen in the wild the following form:

   assert(condition && "message");

This is easy enough to solve on Unix... simply fork() before each test. Just make sure that the master is single-threaded or this won't work.

Also an option is to siglongjmp out of the abort signal handler.

I've found it worthwhile to add a thin facade on top of assert to support out-of-band reporting of errors even if NDEBUG is disabled: https://github.com/codr4life/libc4/blob/master/src/c4/assert...

There's nothing wrong with assert, that's what I use for checking conditions as well. The framework posted is just a thin facade on top of regular functions and asserts to provide fixtures, dynamic grouping and benchmarks. Knowing how long a specific set of tests take to run for N repetitions is useful information. Since I wrote the test, I know exactly how many times the tested API is called. This is about spotting problems and trends early, once I find something interesting I'll drop down to profiling.

Is this about the sad state of test frameworks or about the sad state of C test frameworks? There's a reference to JUnit I don't understand. Junit provides the ability to aggregate tests in suites (strict hierarchies wtf?).

Benchmark may be a different concern.. I know you can do something with rules but it may require more time to set up.

By the way: creating a small, self developed tool that just does what you need is sometimes a good idea, but it doesn't mean everything else is shit. Any library serving a large user base will serve somebody better than others.

The only thing you can say in JUnit is that this test is part of that suite. My tests have several aspects that I'm interesting in using for triggering. Sometimes I want to run all DB tests, regardless of which suite they're in. That's what I mean by strict hierarchies. Part of the point I'm trying to make here is that once you've written a couple of test suites, you probably have a pretty good idea what tools you would prefer; and it's totally OK to dive in and have a go; there is nothing sacred or magic about test frameworks that rules out rolling your own. When it comes to test frameworks, most of what I've seen so far is over-engineered crap in my eyes; that is my authentic experience from 32 years of writing code in umpteen languages.

I'm not sure if it's what the author was getting at, but you can't order test execution in junit, so you can't chain together integration tests.

Of course, that's why you use other tools like TestNG. Unit tests aren't _units_ if they're interdependent.

edit Looks like the author was trying to execute multiple overlapping test suites. Haven't tried that.

JUnit 5 [1], which is currently under development will include the ability to control test order. The whole framework was redesigned with a consistent model for extending almost all aspects of the system.

[1] http://junit.org/junit5/

I looked at JUnit 5 recently and it's super super promising.

The website design does rather give the impression that it's done. I know that if I read the text, I learn that it's not, but my brain categorises those paras as marketing fluff and ignores them.

But it looks very much like what I want from Java testing. My views have, admittedly, been warped by RSpec and Jasmine.

I use a roll-yer-own test framework that (I believe) strikes the perfect balance between setup costs, simplicity and flexibility. It's a small utility that just looks through a given directory tree, finds folders called */test, compiles and runs each .cpp file in that folder (taking cues from formatted comments inside the file if additional files are required), and compares the output to a stored 'good' output for that file, flagging an error if the files differ.

When developing code, you'll almost always end up writing some ad-hoc testing code anyway: Generate some input data, call the code you're testing, printf() some results. At development time, you read this output to verify your code's working well. My system just makes this a bit more methodical and saves a 'known good' result - if the output changes then the code may be wrong. Once you've fixed the issue (or determined that the new behaviour is in fact correct) you update the known-good output.

Dead simple, basically maintains itself, still catches any problems that the tests would catch.

If you like this approach, you may like Testscript even better: https://build2.org/build2/doc/build2-testscript-manual.xhtml

Interesting - I came to a similar conclusion from a completely different angle and blogged about it the other day:


I've similarly rolled my own bash-based solutions, as testing is really as simple as running a script and getting a '0' or non-zero exit code.

I rolled my own a while back https://github.com/keithn/seatest

I kept it very vanilla C and avoided having to maintain a suite structure in favour of a function call system which can be used on embedded systems without much resources.

I should probably do more to maintain that project and improve it as I have outstanding PRs it seems!

I use Google Test, which integrates nicely with CLion and Jenkins.

I haven't documented fully finished work on it yet (but its now running and working for me), but apparently like everyone else I wasn't very impressed with testing frameworks, so decided to attempt to roll my own for Common Lisp rather than C:

See https://github.com/DJMelksham/testy/tree/master if anyone is interested, though I'm not sure how useful it will be at this stage.

Like the article author, I settled on tags for tests rather than explicit hierarchies.

Also somewhat like the author, I care about benchmarks to a degree so amongst other things, each test keeps its latest timing and results stats: and because tests serialise to text/lisp code, each test-code/test-run can be version controlled with a project.

My project has some slight diversions/additions to the article:

1. Being able to capture previously evaluated code and returned results from the REPL and automatically package up the form and the result as a test. Obviously not as applicable to C, but i found myself interactively trying to get a function to work then capturing the result upon success as a regression test rather than the usual "design test up front" from TDD (although that's possible too).

2. I had a long long internal debate about the philosophy of fixtures. The concept of environments/fixtures in which you could run a series of tests was in fact my initial plan, and although I've backed out of it, I think I could theoretically add such a thing back in, but I'm not sure I want to now...

It seems to me by supplying fixtures/environments in which you run multiple tests, you gain "not paying the setup cost/don't repeat yourself" and "flexibility of running tests in arbitrary environments". What I considered the cost was the tendency to move away from true test independence (the benefit of which is easier naive multi-threading of running the test suite), and no longer a full referential transparency/documentation of a test. Test 2784 says it failed in the last run. Is that because I ran it in context A or context B...and what else was running with it and what caused the failure? What is the performance/timing actually measuring? Sort of like lexical scoping, I wanted the reason for the test's values and failures to be as much as possible "in the test source" and no where else.

This philosophy of mine creates obvious limitations: to try to get around them a bit, while keeping the benefits, I made two more design choices.

3. Composable tests: New tests can be defined as the combination of already existing tests.

4. Vectorise test functions!: Most important functions work on vectors or sets of tests as well as individual tests. The function (run-tests), for instance runs a vector of tests passed in to it. The functions (all-tests), (failed-tests), (passed-tests), (get-tests-from-tags), etc, all return vectors of tests you would expect from their names which can then be passed on to the other functions that map across vectors of tests. (print-stats) for example can print statistics on the entire test-suite (because by default its input is the result of the function (all-tests)), but it also accepts any arbitrary set of tests, be it passed-tests, failed-tests or a particular set marked by a specific tag. And because each test is self-contained, all results/reports can just be generated by just combining the internal results/reports of each individual test in the vector. Copy a set of old tests as a basis for new ones, compose multiple tests or functions into one, and/or map new settings or properties across an arbitrary group of tests.

Anyway, I'm curious to hear other people's experiences and design choices in this regard.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact