Hacker News new | past | comments | ask | show | jobs | submit login
Using `make` and `git diff` for a simple and powerful test harness (chrismorgan.info)
238 points by chrismorgan 7 days ago | hide | past | web | favorite | 46 comments

This technique is also known as characterisation testing, golden-master testing, snapshot testing, and probably other names too. I recommend looking into https://approvaltests.com/ (already mentioned by another commenter).

I use characterisation testing all the time, in a perhaps unusual application: Checking the behaviour of "Page Objects" (classes used to model the application-under-test in GUI-level automated testing) when the application-under-test has changed. That's right: Automated [characterisation] tests for my automated [GUI] tests! It makes GUI testing oh so much easier to maintain. I wrote it up here: https://david.rothlis.net/pageobjects/characterisation-tests

Another thought related to characterisation testing, this time from Jeremias Roessler's talk at QAFest Ukraine 2017 (https://www.youtube.com/watch?v=f4PT_u8hjhU): Traditional automated tests (with asserts) are "blacklisting" the changes that aren't allowed, whereas a characterisation test will catch any change in behaviour, and instead you have to "whitelist" the changes that are allowed (by providing regexes or some kind of pattern to match the dynamic output that is allowed to change).

Thanks for the term "characterisation testing".

I started doing this a few years ago while working on a Sphinx extension that defines new directives but also customizes a few output targets (including two ~plaintext output targets).

It's not only useful to avoid accidental output changes, but it's essential for iterating with confidence that tiny changes at the code level affected the output as-expected over a mostly-representative fraction of the documentation set.

I didn't have a term for it at the time. I guess about a year later I bumped into snapshot testing, but that's always felt like more of a metaphor.

I recently found it useful for an approximation algorithm. Since it was an approximation, it didn't really make any hard guarantees on the output for a given input, but snapshot testing at least made it very clear when a change affected the output and to what.

What I like about this idea of Approval Testing is that I don't have to write too many tests if I know my system is working, as long as I am notified when the behaviour of my system has changed (usually in some unexpected way).

This would require auto-mocking of all subsystems to prevent side-effecting functions from sending email & SMSes.

I’m really glad to have these terms of art; now I know what to search for to find similar things. Thanks!

Fun. The rwildcard function here

    rwildcard = $(foreach d,$(wildcard $1*),$(call rwildcard,$d/,$2) $(filter $2,$d))
is very similar to one I wrote in 2011 (https://blog.jgc.org/2011/07/gnu-make-recursive-wildcard-fun...):

    rwildcard=$(foreach d,$(wildcard $1*),$(call rwildcard,$d/,$2) $(filter $(subst *,%,$2),$d))
The only difference is that I wanted to be able to use * in the pattern rather than % (hence the subst).

    $(call rwildcard,,*.c)

    $(call rwildcard,,%.c)

I believe it is a direct descendant of yours, which I’ve used from time to time over the years. I just didn’t feel the need to support *, % was enough for me.

Thanks for all the stuff you’ve written about Make: I’ve enjoyed reading quite a lot of it!

Glad these things help people. The * for wildcard was a bit of syntatic sugar to make $(call rwildcard) closer to $(wildcard).

For those who don't know, here's a list of stuff I've written over the years.


And the hard copy version: https://nostarch.com/gnumake

PS I really like the layout of your blog.

I appreciate these posts for "I just need to do X..." google-search-for-my-fix problems. But I've always found it frustrating that this is what the whole www is now: a whole lot of spread out individual docs for individual problems, rather than a concise review of useful information (a "just the useful bits" database if you will).

Try to learn anything in depth these days and either you're deciphering a dense yet incomplete manual, or googling until eternity. Make is a great example, as even though they have a useful manual, good luck figuring out how to apply it to your problem. Wikis don't quite cut it either. We need a different data model for knowledge sharing.

OK, but I wrote those things as "advanced" uses of make because the actual GNU Make Manual is very, very good (https://www.gnu.org/software/make/manual/) and people should read it.

I keep wondering about doing a video series explaining GNU Make from the ground up.

I went to an awful lot of trouble to apply roughly the same technique to test some code generation code [0]. Required pulling in a random dependency and hacking around for a while to make the output look right. Your solution is much slicker!

[0]: https://github.com/couchand/wayfinder/blob/1dc58a1130bc17941...

What you ended up with has two major advantages: ① it integrates with the Cargo test harness, so that Rust developers’ expectations will be met; and ② it does it in one compiler invocation. The test harness I describe would be quite slow for your sort of situation.

I’m not completely sold on using a separate crate and build.rs how you have, but it looks like it’ll yield a good usable result. A couple of related things you might be interested in: compiletest_rs, trybuild.

A bit off-topic, but I like the tty-player you're using for the demo (and apparently have created). https://tty-player.chrismorgan.info/

I haven't come across this kind of terminal playback on the web before, seems a bit more bandwidth efficient and snappier than video or gifs.

Heads up: what’s shown and run at https://tty-player.chrismorgan.info is currently out of date (though https://github.com/chris-morgan/tty-player is up to date). I wrote it years back against Web Components v0 for a project that has been shelved for a few years, and more recently updated it to Web Components v1 and Shadow DOM, because I wanted it for this article. Still need to update it to xterm.js.

The most popular thing in this space nowadays is asciinema, mostly used by embedding it from asciinema.org, but I prefer my thing.

It also lets you copy and paste directly, very cool.

it's my friday, here's a haiku for you:

    this is a nice post
    i'm not sober on hn
    fixed width code font please

The code font I use is Triplicate: https://mbtype.com/fonts/triplicate/

I use the Poly variant by default, only disabling it in places where the monospacedness matters for layout purposes, e.g. the terminal recording in this article (because of Vim), and Rust compiler output in some of my other articles. The Poly variant does things like make i, l and space a little narrower, and m a little wider, which makes casual reading more comfortable than a strict monospaced font.

I do this because I believe that monospacedness is substantially overrated in most places, and that most things actually look better not strictly monospaced. I contemplated not even using a monospace-style font at all but decided that was probably going too far for most people. (And as a Vim user I necessarily work in a monospaced text editor; but if it weren’t for that, I’d probably go full proportional.)

So I’m curious if you have further feedback on this matter and why you find fault with what I’m doing. It may influence what I do.

I think one of the best reasons to use a monospaced typeface is that it is a fairly strong and accurate signifier of code. Of course, in this case you have special highlighting for it that makes it less useful, but in general I think that it really helps. (Plus there are a couple of other, minor benefits probably not worth listing here.)

As mentioned elsewhere I've been using proportional for code and it's very nice but for a small drawback. I'd say give it a try for a couple of days and see.

I second the mono space for code suggestion. I guess my brain is wired to see when characters are offset slightly as wrong. It’s a bit unsettling.

To you and anyone else with this opinion: try disabling the `code { font-feature-settings: "ss01" 1, "ss02"; }` rule in the CSS, which will disable the Poly variant, and let me know how it feels.

It seems possible to me that it’s actually the use of a true serif monospace font (>95% of monospace fonts used these days are sans-serif, and >95% of the remainder are slab serif) that’s throwing you off, more than the strict monospacedness of it, and I’d like to try that hypothesis out.

(In early development of the visual style, I used only the font with no spacing or colour hints, but I found the monospace Triplicate too similar to the serif Equity, so that it was sometimes not quite clear enough that it was code; that was the reason why I put the background colour on inline code rather than only on code blocks, even though that wouldn’t be done in a printed manuscript, which is a style I am loosely imitating in part.)

Disabling `code { font-feature-settings: "ss01" 1, "ss02"; }` made a marginal improvement for me but it was only marginal. The bigger issue I had was the type-face was too large.

Ultimately you're never going to win a discussion about type-faces because they're entirely personal preference. For example I find most proportional fonts to be too narrow and harder to read so much prefer the typically wider glyphs of monospaced type-faces. To the extent that the font I used on one of my blogs was rounder letters. I then had complaints that others found it "unreadable" and preferred something narrower.

I'm sure there will always be a sweet spot where more than average number of readers will be content however the web would be a little duller if everyone converged on that same type-face. So I'm willing to take a marginal hit on readability (and let's be honest, the different is almost always only marginal) for the sake of websites having their own personalities. The alternative if people can just toggle Reader View in Firefox (or whatever the equivalent is in other browsers)

I object to this whole thread as bikeshedding, however I happen to use proportional fonts for code (lucide sans unicode in windows) but just yesterday reverted back to a proportional font (lucida console).

While I much, much prefer proportional it's simply that indented stuff after text didn't line up properly in it eg.

stuff = 23

more stuff = 99

x = 41

(edit: sorry HN is messing up the indenting even further, but you know what I'm getting at)

Also my magit popup buffer is all ziggy-zaggy instead of properly column'ed. I can live with that. Edit - and git log which relies on fixed-width to show properly gets all bollixed.

In your case I can't see your code suffering at all from these problems, so I'm fine with it.

The niceness of proportionals may be enough for me to go back to it. I don't know yet.

Further edit: thanks for the article!

In monospace that typeface looks glorious. I'd recommend continuing to use proportional for inline keywords, though.

Interesting. I’ll give the thought time to settle and probably disable Poly for all code blocks tomorrow, leaving it on only for inline code. (That’s what I did initially in the design, but then I decided to make normal code blocks Poly as well because I preferred it so, and why not? —But it seems to be disconcerting people.)

Yet one of the things I really like about Poly is how it decreases width. Disabling Poly would slightly harm layout on https://chrismorgan.info/blog/rust-fizzbuzz/ where I have code side-by-side, increasing the width required for the full layout without wrapping from ~1500px to 1600px. Ah well. It’s not critical, just makes me a little bit sad. (Admittedly I could get much of that back with `tab-size: 3;` instead of `tab-size: 4`, but that would doubtless make people baulk too. And I’m not going `tab-size: 2` except on small displays.)

I'd gently suggest that, while the side-by-side code is a strong with the early for loops, it's not adding much as much value for the longer snippet with the Display impl.

Yikes, one hundred twenty dollars for a font? That's quite expensive. Have you found it to make that great a difference?

I use Triplicate, Concourse and Equity extensively. Equity, there are surprisingly few fonts like it in its organic feel and faithfulness to the old art of printing. (This doesn’t convey as much as I’d like, but I lack the terms of art to describe what I mean properly.) Triplicate, well, it’s the only good true serif monospace that I know of, and I like that. Each font fills a niche that I very much appreciate. I reckon they were worth buying on their own merits, but it was also then a way of supporting Matthew Butterick’s project https://practicaltypography.com/ which I strongly approve of as a project.

It's a beautiful font and I didn't notice that it wasn't fixed-width. You whole site design is awesome IMO.

As with most poetry jokes, you've worked backwards from the meter to find a line that fits the style but has nothing to do with the rest of the poem.

  Please rip out my eyes
  I can no longer read this
  Polyspaced madness

Note that GNU CoreUtils also use Make + shell-scripts + diff for their test harness: https://github.com/coreutils/coreutils/tree/master/tests

Its not as slick or concise as the solution proposed here, but it shows that the approach is viable for medium-large projects.

I've used a similar technique to test, generating an expected output, actual output and then diff them.

One trick I found helpful was using JSON to serialize test results instead of unstructured plain text.

Test results stored as JSON are much easier parse and therefore process. You can quickly whip up programs that verify the tests satisfy invariants, diff the tests and filter out expected test changes from unexpected test changes.

When I did it, we skipped the Make part and reinvented it in Python: https://github.com/libfirm/sisyphus

This aspect of expected and unexpected test changes is even more important than the diff part in my opinion. It allows you add failing tests immediately once you get the bug report and you notice if you fix something accidentally.

The Readme of that project could do with a few examples of what tests and successful/unsuccessful output look like. I found the examples folder and still can't visualize what it might be like.

I've been using cram (also written in Python) for a private project and been mostly happy with it: https://bitheap.org/cram/

cram is a very good tool for testing in this manner: a test file is basically a copy/paste of a terminal window, deviations from expected behavior are represented using diffs, and `cram -i` will prompt you to update the test file with actual output. and it supports globbing and regular expressions for fuzzy matching.

i've been using cram for everything i write for what feels like a decade (it'll be 10 years old in september), and though it has it's limits, i bitch and moan about it very little given how much i rely on it. if you'd know me you'd recognize that this is a huge endorsement, i'm quite vocal about my disdain for most software in existence. :)

edit: s/i'll/it'll/ rofl

edit: url: https://bitheap.org/cram/

This is a really nice looking blog, I'm impressed.

Major drawback mentioned here is that make breaks when used with filenames with whitespaces, which is a big blocker for some uses. Anyone know of a similar alternative which handles this?

I’m a big fan of using make in my projects. It’s nice to be able to sit down another dev or new user and just tell them to `make build` or `make test`. It also makes finding bugs easier as you can bisect with it.

I would suggest stop using a homegrown solution and use something robust like bazel.


You could make use the pattern stem automatic variable in lines five and six to replace

    $(patsubst %.stdout,%.stderr,$@)


Genius. I'm sure Bryan Kernighan would approve :)

This is really awesome, thanks for the post!

Brilliant, I will definitely use this now. Love the simplicity!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact