
Tom Duff on Duff's device (1988) - Insanity
https://www.lysator.liu.se/c/duffs-device.html
======
userbinator
_In fact, the example in the message is not implementable as memcpy, nor is
any computer likely to have an memcpy-like idiom that implements it_

REP OUTSW.

Reading a lot of literature from this period in computing, it seems that the
majority of "serious programmers" were focused mostly on
mainframes/minicomputers, and there was a divide between those people and the
ones who were focused almost exclusively on the PC/microcomputers. No one
thought x86 would become as commonplace as it is today.

 _Be careful when you use this thing that you don 't unwind the loop so much
that you overflow your machine's instruction cache._

Saving cache has become more important with the growing gap between core and
memory latency, and processors have gotten much better at predicting and
executing "around" loops (especially for superscalar/OoO), so much that in my
experience loop unrolling is very seldom beneficial to a great extent today.
You may still see improvements if you microbenchmark, but the extra code has
displaced something else from the cache and made the overall result slower.

~~~
cesarb
REP OUTSW writes to an I/O port. Duff's device writes to memory-mapped I/O.
It's not the same thing.

~~~
desertrider12
x86 can still do this. It has STOS (read next byte/word), LODS (memset), and
MOVS (memcpy).

------
thanatos_dem
Cool to look at the original work on this from 1988, and see just how little
it's changed in the nearly 3 decades since.

Go for instance makes use of Duff's devices generated for each architecture in
its core runtime. You can see the code to generate the devices here -
[https://golang.org/src/runtime/mkduff.go](https://golang.org/src/runtime/mkduff.go)

------
db48x
It's always worth rereading this when it comes up just for the last line:

"Many people (even bwk?) have said that the worst feature of C is that
switches don't break automatically before each case label. This code forms
some sort of argument in that debate, but I'm not sure whether it's for or
against."

I crack up every time.

------
harrisi
> Actually, I have another revolting way to use switches to implement
> interrupt driven state machines but it's too horrid to go into.

Does anyone have any insight into what he was talking about here?

~~~
kindfellow92
Yeah, I think he’s referring to what’s described in more detail here:
[https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html](https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html)

~~~
andreasgonewild
Been there, done that :) It's a neat trick, but the issue for me is that you
still need to pass a separate struct for state which breaks the illusion
enough that you might as well just write an ordinary iterator.

~~~
kindfellow92
I agree. If you want user-level threading, just use a stack swapping library
like libpth.

------
aap_
I really like td's style of writing C. You can find some of his code here:
[http://www.iq0.com/duffgram/index.html](http://www.iq0.com/duffgram/index.html)

~~~
bluedino
Very odd to see it without blank lines. I wonder what inspired this style?

------
g051051
Duff's Device was one of those "light bulb" moments, where my understanding of
computers and programming really expanded.

~~~
Bromskloss
Allow me to dramatise:

[Studies Duff's device]

"What... Oh, no, C! What are you doing?!"

~~~
kabdib
Always followed by wild laughter, when I've seen people get the light.

------
kyberias
LLVM seems to have this as a test case: [https://github.com/llvm-mirror/test-
suite/blob/master/Single...](https://github.com/llvm-mirror/test-
suite/blob/master/SingleSource/Regression/C/DuffsDevice.c)

------
DonHopkins
NeWS had some spectacular abuses of C preprocessor macros with nested switch
statement, loops and gotos.

Especially the conic splines code by Vaughan Pratt and James Gosling, which
"flattens" Bezier curves into conic splines (unlike straight lines like most
PostScript renderer implementations), which Vaughan Pratt's conix package can
draw really fast.

[http://www.chilton-
computing.org.uk/inf/literature/books/wm/...](http://www.chilton-
computing.org.uk/inf/literature/books/wm/p005.htm)

SunDew - A Distributed and Extensible Window System

[...] Two pieces of work were done at SUN which provide other key components
of the solution to the imaging problems. One is Vaughan Pratt's Conix [53], a
package for quickly manipulating curve bounded regions, and the other is Craig
Taylor's Pixscene [63], a package for performing graphics operations in
overlapped layers of bitmaps. [...]

[...] Pixscene is based on a shape algebra package. The ability, provided by
Conix, to do algebra very rapidly on curves should make non-rectangular
windows perform well. [...]

[53] Pratt, V., Techniques for Conic Splines, Computer Graphics 19(3),
pp.151159 (1985).

[63] Taylor, C. and Pratt, V., A Technique for Representing Layered Images.
SUN Microsystems Inc., 2550 Garcia Avenue, Mountain View, CA94043.

Direct Least-Squares Fitting of Algebraic Surfaces: Vaughan Pratt, Sun
Microsystems Inc. and Stanford University. April 30, 1987.

[http://boole.stanford.edu/pub/fit.pdf](http://boole.stanford.edu/pub/fit.pdf)

[http://donhopkins.com/home/code/NeWS/SUN/src/server/graphics...](http://donhopkins.com/home/code/NeWS/SUN/src/server/graphics/pathtoarc.c)

    
    
        /*-
            Convert a PostScript style path to an arclist
            pathtoarc.c, Thu Sep 19 15:45:44 1985
         */
    

[http://donhopkins.com/home/code/NeWS/SUN/src/server/graphics...](http://donhopkins.com/home/code/NeWS/SUN/src/server/graphics/arctochain.c)

    
    
        /*-
            Convert a conic arc to a curve
            arctochain.c, Tue Jun 11 14:23:17 1985
            "Elegance and truth are inversely related." -- Becker's Razor
                Vaughan Pratt
                Sun Microsystems
                This code is a version of the conix package that has been
                heavily massaged by James Gosling.
         */
    

[...]

    
    
        /* macros for the conic machines */
        #define out(b) {		\
            if (b)			\
            xpos += dir;		\
            else			\
            *ypos++ = xpos;		\
        }
    
        #define across(rel,q1) {						\
                out(1);							\
                Fx += Fxx, Fy += Fyx;					\
                if ((F += Fx) rel 0)					\
                    goto q1;					\
            }
    
        #define down(rel,q1) {							\
                out(0);							\
                if (--rem0 < 0)						\
                    goto final;					\
                Fx -= Fxy, Fy -= Fyy;					\
                if ((F -= Fy) rel 0)					\
                    goto q1;					\
            }
    
        #define trans1(axis, q1)						\
            for (;;)							\
                axis(>=, q1);
    
        #define trans2(axis, q1, rel, q2)					\
            for (;;) {							\
                axis(<, q1);						\
                if (Fx rel Fx_err)					\
                    goto q2;					\
            }
    

[...]

    
    
            /* enter conic machine at appropriate state */
            rem0 = size.y;
            if ((size.x > 0) == (side < 0))
            goto EOD;
            else
            goto OED;
            /* OE state machine */
        ODE:trans1(across, OED);
        DOE:trans2(across, ODE, >, OED);
        OED:trans2(down, ODE, <, DOE);
            /* EO state machine */
        EDO:trans1(down, DEO);
        DEO:trans2(across, EDO, >, EOD);
        EOD:trans2(down, EDO, <, DEO);
        final:;
            /*-	clearasil(buf0+1, bitlen); 	/* clean out 0011/1100 pimples */
            assert(ypos <= curve->x + (curve->y1 - curve->y0 + 1));
    

The main loop of the NeWS PostScript interpreter itself is the realization of
Duff's dire warning "Actually, I have another revolting way to use switches to
implement interrupt driven state machines but it's too horrid to go into."

[http://donhopkins.com/home/code/NeWS/SUN/src/server/nucleus/...](http://donhopkins.com/home/code/NeWS/SUN/src/server/nucleus/PostScript.c)

    
    
        /*
         * This is the PostScript interpreter.  Its implementation breaks almost every
         * rule of structured programming.  This is a result of needing to support
         * multiple PostScript processes and a paranoia about performance. There are a
         * couple of #includes that you should think of as procedure calls.  Life is
         * tough in the trenches.
         * 
         */
    
        /*
         * The PostScript interpreter has a fairly simple basic structure:
         * 
         * while (1) { 1. Pick up an object to interpret 2. Execute it }
         * 
         * Part 1, picking up an object to interpret is handled by the statement that
         * starts "switch (es->type)".  es points to the top of the execution stack: a
         * stack of sources of executable objects.  The switch determines what kind of
         * thing the object is being fetched from: an array, a string, ...
         * 
         * Part 2, executing an object, is handled by the code that starts with the label
         * "execute_top".  It does some tests and switches on the type of the object
         * being executed.
         * 
         * All of this is complicated by the mechanism for blocking.  If the process
         * needs to block, say by needing to read a new buffer from a socket or by
         * needing to wait for a message, then the interpreter needs to stop working
         * on this process and leave information around that lets it restart.  This is
         * the "restart_state" variable: it is used as the index to a switch statement
         * that is executed when the interpreter starts up, so it won't necessarily be
         * intering the main loop of the interpreter at the top. There are cleaner
         * ways of doing this, but they can't be written in C!
         */
    

[...]

    
    
            /*
             * Restart the interpreter where it left off last
             */
            switch (ee->restart_state) {
            case 1:
            goto read_1;
    

[...]

    
    
            case 24:
            goto read_24;
            }
    
            /* The main loop of the interpreter */
            while (1) {
            struct object new;
    

[...]

    
    
        /* Pick up an object to execute */
            switch (es->type) {
    

[...]

    
    
            case file_execution:{
                register PSFILE *f = es->env.file.file;
                /* be charitable and think of this as a procedure call */
        #ifdef GPROF_HOOKS
        /* gP_GetFile defined in parse_file.h */
        #endif
        #include    "parse_file.h"
                break;
                }
    

[http://donhopkins.com/home/code/NeWS/SUN/src/server/nucleus/...](http://donhopkins.com/home/code/NeWS/SUN/src/server/nucleus/parse_file.h)

    
    
        /*-
         * 	Parse a PostScript object from a PSFILE
         *
         * 	parse_file.h, Mon Dec 31 09:36:02 1984
         *
         * 		James Gosling,
         * 		Sun Microsystems
         */
    
        /*
         * This is a version of getc that pauses if the buffer is empty and suspends
         * the process
         */

