Hacker News new | past | comments | ask | show | jobs | submit login
Tom Duff on Duff's device (1988) (liu.se)
54 points by Insanity on Nov 12, 2017 | hide | past | favorite | 18 comments



In fact, the example in the message is not implementable as memcpy, nor is any computer likely to have an memcpy-like idiom that implements it

REP OUTSW.

Reading a lot of literature from this period in computing, it seems that the majority of "serious programmers" were focused mostly on mainframes/minicomputers, and there was a divide between those people and the ones who were focused almost exclusively on the PC/microcomputers. No one thought x86 would become as commonplace as it is today.

Be careful when you use this thing that you don't unwind the loop so much that you overflow your machine's instruction cache.

Saving cache has become more important with the growing gap between core and memory latency, and processors have gotten much better at predicting and executing "around" loops (especially for superscalar/OoO), so much that in my experience loop unrolling is very seldom beneficial to a great extent today. You may still see improvements if you microbenchmark, but the extra code has displaced something else from the cache and made the overall result slower.


REP OUTSW writes to an I/O port. Duff's device writes to memory-mapped I/O. It's not the same thing.


x86 can still do this. It has STOS (read next byte/word), LODS (memset), and MOVS (memcpy).


It's always worth rereading this when it comes up just for the last line:

"Many people (even bwk?) have said that the worst feature of C is that switches don't break automatically before each case label. This code forms some sort of argument in that debate, but I'm not sure whether it's for or against."

I crack up every time.


Cool to look at the original work on this from 1988, and see just how little it's changed in the nearly 3 decades since.

Go for instance makes use of Duff's devices generated for each architecture in its core runtime. You can see the code to generate the devices here - https://golang.org/src/runtime/mkduff.go


> Actually, I have another revolting way to use switches to implement interrupt driven state machines but it's too horrid to go into.

Does anyone have any insight into what he was talking about here?


Yeah, I think he’s referring to what’s described in more detail here: https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html


Been there, done that :) It's a neat trick, but the issue for me is that you still need to pass a separate struct for state which breaks the illusion enough that you might as well just write an ordinary iterator.


I agree. If you want user-level threading, just use a stack swapping library like libpth.


Most well-written portable interrupt code needs to pass an explicit context anyway, so... it's not as bad as it sounds.


See the NeWS PostScript interpreter code I posted in this thread.

NeWS has multiple PostScript threads, which must be able to block and re-start in the middle of parsing PostScript expressions or anything else that might block, so the entire interpreter has a switch statement for dispatching PostScript operators, and any of those operators which might block have "goto" labels so the interpreter can jump back into the middle of the operator when the threads gets switched back in.

The most complicated operator is the PostScript parser itself (the "file_execution" operator), which blocks in many places, and whose C source code was broken out into a .h file with the comment "be charitable and think of this as a procedure call".

This is actually quite similar to how C# and other languages implement co-routines under the hood. It's just written by hand in portable C.

Here's the comment explaining it:

    /*
     * The PostScript interpreter has a fairly simple basic structure:
     * 
     * while (1) { 1. Pick up an object to interpret 2. Execute it }
     * 
     * Part 1, picking up an object to interpret is handled by the statement that
     * starts "switch (es->type)".  es points to the top of the execution stack: a
     * stack of sources of executable objects.  The switch determines what kind of
     * thing the object is being fetched from: an array, a string, ...
     * 
     * Part 2, executing an object, is handled by the code that starts with the label
     * "execute_top".  It does some tests and switches on the type of the object
     * being executed.
     * 
     * All of this is complicated by the mechanism for blocking.  If the process
     * needs to block, say by needing to read a new buffer from a socket or by
     * needing to wait for a message, then the interpreter needs to stop working
     * on this process and leave information around that lets it restart.  This is
     * the "restart_state" variable: it is used as the index to a switch statement
     * that is executed when the interpreter starts up, so it won't necessarily be
     * intering the main loop of the interpreter at the top. There are cleaner
     * ways of doing this, but they can't be written in C!
     */


I really like td's style of writing C. You can find some of his code here: http://www.iq0.com/duffgram/index.html


Very odd to see it without blank lines. I wonder what inspired this style?


Duff's Device was one of those "light bulb" moments, where my understanding of computers and programming really expanded.


Allow me to dramatise:

[Studies Duff's device]

"What... Oh, no, C! What are you doing?!"


Always followed by wild laughter, when I've seen people get the light.



NeWS had some spectacular abuses of C preprocessor macros with nested switch statement, loops and gotos.

Especially the conic splines code by Vaughan Pratt and James Gosling, which "flattens" Bezier curves into conic splines (unlike straight lines like most PostScript renderer implementations), which Vaughan Pratt's conix package can draw really fast.

http://www.chilton-computing.org.uk/inf/literature/books/wm/...

SunDew - A Distributed and Extensible Window System

[...] Two pieces of work were done at SUN which provide other key components of the solution to the imaging problems. One is Vaughan Pratt's Conix [53], a package for quickly manipulating curve bounded regions, and the other is Craig Taylor's Pixscene [63], a package for performing graphics operations in overlapped layers of bitmaps. [...]

[...] Pixscene is based on a shape algebra package. The ability, provided by Conix, to do algebra very rapidly on curves should make non-rectangular windows perform well. [...]

[53] Pratt, V., Techniques for Conic Splines, Computer Graphics 19(3), pp.151159 (1985).

[63] Taylor, C. and Pratt, V., A Technique for Representing Layered Images. SUN Microsystems Inc., 2550 Garcia Avenue, Mountain View, CA94043.

Direct Least-Squares Fitting of Algebraic Surfaces: Vaughan Pratt, Sun Microsystems Inc. and Stanford University. April 30, 1987.

http://boole.stanford.edu/pub/fit.pdf

http://donhopkins.com/home/code/NeWS/SUN/src/server/graphics...

    /*-
        Convert a PostScript style path to an arclist
        pathtoarc.c, Thu Sep 19 15:45:44 1985
     */
http://donhopkins.com/home/code/NeWS/SUN/src/server/graphics...

    /*-
        Convert a conic arc to a curve
        arctochain.c, Tue Jun 11 14:23:17 1985
        "Elegance and truth are inversely related." -- Becker's Razor
            Vaughan Pratt
            Sun Microsystems
            This code is a version of the conix package that has been
            heavily massaged by James Gosling.
     */
[...]

    /* macros for the conic machines */
    #define out(b) {		\
        if (b)			\
        xpos += dir;		\
        else			\
        *ypos++ = xpos;		\
    }

    #define across(rel,q1) {						\
            out(1);							\
            Fx += Fxx, Fy += Fyx;					\
            if ((F += Fx) rel 0)					\
                goto q1;					\
        }

    #define down(rel,q1) {							\
            out(0);							\
            if (--rem0 < 0)						\
                goto final;					\
            Fx -= Fxy, Fy -= Fyy;					\
            if ((F -= Fy) rel 0)					\
                goto q1;					\
        }

    #define trans1(axis, q1)						\
        for (;;)							\
            axis(>=, q1);

    #define trans2(axis, q1, rel, q2)					\
        for (;;) {							\
            axis(<, q1);						\
            if (Fx rel Fx_err)					\
                goto q2;					\
        }
[...]

        /* enter conic machine at appropriate state */
        rem0 = size.y;
        if ((size.x > 0) == (side < 0))
        goto EOD;
        else
        goto OED;
        /* OE state machine */
    ODE:trans1(across, OED);
    DOE:trans2(across, ODE, >, OED);
    OED:trans2(down, ODE, <, DOE);
        /* EO state machine */
    EDO:trans1(down, DEO);
    DEO:trans2(across, EDO, >, EOD);
    EOD:trans2(down, EDO, <, DEO);
    final:;
        /*-	clearasil(buf0+1, bitlen); 	/* clean out 0011/1100 pimples */
        assert(ypos <= curve->x + (curve->y1 - curve->y0 + 1));
The main loop of the NeWS PostScript interpreter itself is the realization of Duff's dire warning "Actually, I have another revolting way to use switches to implement interrupt driven state machines but it's too horrid to go into."

http://donhopkins.com/home/code/NeWS/SUN/src/server/nucleus/...

    /*
     * This is the PostScript interpreter.  Its implementation breaks almost every
     * rule of structured programming.  This is a result of needing to support
     * multiple PostScript processes and a paranoia about performance. There are a
     * couple of #includes that you should think of as procedure calls.  Life is
     * tough in the trenches.
     * 
     */

    /*
     * The PostScript interpreter has a fairly simple basic structure:
     * 
     * while (1) { 1. Pick up an object to interpret 2. Execute it }
     * 
     * Part 1, picking up an object to interpret is handled by the statement that
     * starts "switch (es->type)".  es points to the top of the execution stack: a
     * stack of sources of executable objects.  The switch determines what kind of
     * thing the object is being fetched from: an array, a string, ...
     * 
     * Part 2, executing an object, is handled by the code that starts with the label
     * "execute_top".  It does some tests and switches on the type of the object
     * being executed.
     * 
     * All of this is complicated by the mechanism for blocking.  If the process
     * needs to block, say by needing to read a new buffer from a socket or by
     * needing to wait for a message, then the interpreter needs to stop working
     * on this process and leave information around that lets it restart.  This is
     * the "restart_state" variable: it is used as the index to a switch statement
     * that is executed when the interpreter starts up, so it won't necessarily be
     * intering the main loop of the interpreter at the top. There are cleaner
     * ways of doing this, but they can't be written in C!
     */
[...]

        /*
         * Restart the interpreter where it left off last
         */
        switch (ee->restart_state) {
        case 1:
        goto read_1;
[...]

        case 24:
        goto read_24;
        }

        /* The main loop of the interpreter */
        while (1) {
        struct object new;
[...]

    /* Pick up an object to execute */
        switch (es->type) {
[...]

        case file_execution:{
            register PSFILE *f = es->env.file.file;
            /* be charitable and think of this as a procedure call */
    #ifdef GPROF_HOOKS
    /* gP_GetFile defined in parse_file.h */
    #endif
    #include    "parse_file.h"
            break;
            }
http://donhopkins.com/home/code/NeWS/SUN/src/server/nucleus/...

    /*-
     * 	Parse a PostScript object from a PSFILE
     *
     * 	parse_file.h, Mon Dec 31 09:36:02 1984
     *
     * 		James Gosling,
     * 		Sun Microsystems
     */

    /*
     * This is a version of getc that pauses if the buffer is empty and suspends
     * the process
     */




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: