
Duff's device - simonpure
https://en.wikipedia.org/wiki/Duff%27s_device
======
j1elo
Don't miss out on the brilliant link at the bottom - Adam Dunkels'
Protothreads - Lightweight, Stackless Threads in C also uses nested
switch/case statements [1].

I learned about the theory with it, and used his library to implement
cooperative multitasking on a bare metal embedded AVR C project. I wrote a
task manager and dispatcher based on Protothreads, and that effort was what
later made it simple for me to understand the JS async/await paradigm that
would appear under my radar a couple years later.

EDIT to add that I think the concepts might still apply, since these days I've
seen things like stackless async coroutines discussed for Rust... didn't study
those in detail yet, but I wouldn't be surprised if the basic concepts are the
same.

[1]: Current website:
[http://dunkels.com/adam/pt/](http://dunkels.com/adam/pt/)

~~~
sillysaurusx
Wikipedia entry on protothreads:
[https://en.wikipedia.org/wiki/Coroutine#Implementations_for_...](https://en.wikipedia.org/wiki/Coroutine#Implementations_for_C)

The source code is interesting. LC_SET is particularly neat:

    
    
      /* WARNING! lc implementation using switch() does not work if an
         LC_SET() is done within another switch() statement! */
      
      /** \hideinitializer */
      typedef unsigned short lc_t;
      
      #define LC_INIT(s) s = 0;
      
      #define LC_RESUME(s) switch(s) { case 0:
      
      #define LC_SET(s) s = __LINE__; case __LINE__:
      
      #define LC_END(s) }

~~~
edflsafoiewq
One cool thing you can do is to manage the numbers in the case manually, eg.

    
    
      #define LC_SET(s,num) s = num; case num:
    

This will let you serialize the state s, modify the program, recompile and
reload s, and have it resume at the desired point.

~~~
exDM69
You could take this one step further by using the __COUNTER__ macro to
generate unique numbers in the preprocessor.

Alas, this isn't a part of the standard (iirc), so you need to depend on
compiler-specific features.

~~~
edflsafoiewq
What I'm thinking of is modifying it like this

    
    
        LC_SET(s,0)
        A
      + LC_SET(s,2)
      + C
        LC_SET(s,1)
        B
    

When the number for each resume point is explicit instead of automatic, you
can edit the source code while controling where any existing suspended
coroutines will resume at.

------
jonahbenton
I worked with a recipient of this email for many years, and part of our work
involved writing C. It was a real pleasure reading his C, he was a master of
the form.

There is something I miss in modern languages...they do not permit the _abuse_
that C permits. The preprocessor, pointer arithmetic, various semantics around
types and overflows...one of course sees some horrendous monstrosities in the
Linux kernel source...

C as a tool is only an approximation of the underlying machine, as such the
tool can be bent to deliver semantics that are wholly surprising. I can think
of no mainstream language that has that property. It's a shame.

~~~
WalterBright
> I can think of no mainstream language that has that property.

Duff's Device works in D:

    
    
        void send(short* to, short* from, size_t count)
        {
            auto n = (count + 7) / 8;
            final switch (count % 8) {
            case 0: do { *to = *from++; goto case;
            case 7:      *to = *from++; goto case;
            case 6:      *to = *from++; goto case;
            case 5:      *to = *from++; goto case;
            case 4:      *to = *from++; goto case;
            case 3:      *to = *from++; goto case;
            case 2:      *to = *from++; goto case;
            case 1:      *to = *from++;
                    } while (--n > 0);
            }
        }
    

The `goto case;` is D's way of saying "fall through to the next case".

~~~
jadbox
Ah that's interesting, didn't know D had goto case statements. Btw, what are
some projects you've used D for? What are other surprising lesser known
features of it to you?

~~~
WalterBright
The biggest project is implementing the D compiler itself in 100% D. Don't
have much time for other projects.

Nested functions are a little gem. I stole the idea from Pascal. I've since
discovered that they can nicely clean up spaghetti code that C engenders,
without losing efficiency.

Nested functions would fit into C seamlessly, I sometimes wonder why the C
people don't adopt it. I can only assume they just don't realize how sweet
they are, which is understandable because you just have to use them a bit to
see it.

~~~
salgernon
How do D’s nested procedures differ from the Obj-C block extension?

The original pascal based Mac toolbox APIs encouraged nested procedures in
modal contexts - but Nested procedures couldn’t escape their outer frame.

~~~
WalterBright
I have no idea what Obj-C block extensions are.

D's nested functions (like Pascal's) have two frame pointers - a dynamic frame
pointer (like C has), pointing to the caller's stack frame, and a static frame
pointer pointing to the enclosing function's stack frame.

This allows the nested function to access the enclosing function's variables.

It's a bit tricky to set up, but works a treat.

------
dang
See also:

2017
[https://news.ycombinator.com/item?id=15679205](https://news.ycombinator.com/item?id=15679205)
\- particularly [https://www.lysator.liu.se/c/duffs-
device.html](https://www.lysator.liu.se/c/duffs-device.html)

2016 (a bit)
[https://news.ycombinator.com/item?id=12758686](https://news.ycombinator.com/item?id=12758686)
and
[https://news.ycombinator.com/item?id=10830744](https://news.ycombinator.com/item?id=10830744)

2015
[https://news.ycombinator.com/item?id=10175901](https://news.ycombinator.com/item?id=10175901)

2015 (in Go)
[https://news.ycombinator.com/item?id=8990742](https://news.ycombinator.com/item?id=8990742)

2012
[https://news.ycombinator.com/item?id=4360306](https://news.ycombinator.com/item?id=4360306)
(note the personal anecdote)

2010
[https://news.ycombinator.com/item?id=1896643](https://news.ycombinator.com/item?id=1896643)

2009
[https://news.ycombinator.com/item?id=761177](https://news.ycombinator.com/item?id=761177)

2008
[https://news.ycombinator.com/item?id=195763](https://news.ycombinator.com/item?id=195763)

------
_pastel
Every time this article is reposted I can't help but laugh at the line:

"C's default fall-through in case statements has long been one of its most
controversial features; Duff himself said that 'This code forms some sort of
argument in that debate, but I'm not sure whether it's for or against.'"

------
ncmncm
Read the source to Putty sometime.

The main program is one big, big function with a big, big loop and switch in
it. Each place where it does something that would (otherwise) block, it sets a
state variable to match a case label right after, does a non-blocking version,
and breaks. (This is done with a macro.)

After the break, it stashes the state, and if anything is ready, switches to
that state and re-enters the switch. Otherwise, it sleeps waiting for work
that can proceed.

As a colleague remarked, "I love it, and hate myself for loving it."

------
burfog
Something like it works for if...else...if...else too. To my horror, this was
more readable and maintainable than any alternative I could imagine.

    
    
        switch(override){
        default:
            printf("unexpected...\n");
            // fallthrough
        case 0:
            if(z_bits&4 || N_flag){
        case 100:
                handle_100();
            }else if(x_bits & 0x80){
        case 200:
                handle_200();
            }else{
        case 500:
                handle_500();
            }
        }
    

It's cut down a bit for Hacker News. The original had about a dozen lines of
code where each handle_XXX function now sits.

------
chc4
While not a "pure" Duff's Device, Golang apparently does something similar
with STOSD for an unrolled memcpy that it can call into at different points. I
think it's really cute.

[https://twitter.com/mayahustle/status/1260678725257003008](https://twitter.com/mayahustle/status/1260678725257003008)

------
zx2c4
I used one of these for the implementation of Siphash in the Linux kernel, if
you're interested in a real world example:
[https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/siphash.c#n52)

~~~
OskarS
Maybe I'm missing something, but that just looks like a regular C switch to
me. Yeah, it has fall-through, but all C switches has fall-through unless you
explicitly break. The point of Duff's device is that it _interleaves_ two
different structures: a switch and a loop.

It's unusual because in structured languages you expect the lexical structure
to be entirely hierarchical (you wouldn't have an if block that somehow
overlapped with a for-loop, for instance). Duff's device breaks that
assumption because it uses one control structure (the switch) to jump into the
middle of a second control structure (the loop), and lexically, the switch is
_both_ inside the loop and outside of it.

This is why lots of people when they first see it don't think it is even
syntactically valid. A regular switch with fall-through doesn't violate the
"hierarchical lexical structure" assumption in that way.

------
Symmetry
This is, oddly, one of the few things I think would actually be clearer in
assembly language than it is in C.

~~~
marvy
You can actually get rid of the confusing nesting:

    
    
        switch(n%4)
        {
        loop:
        case 0: stuff();
        case 3: stuff();
        case 2: stuff();
        case 1: stuff();
        if(--n) goto loop;
        }

------
api
This is a fun language hack, but be aware that this is probably not good for
performance in 2020. Nearly anything larger than a coffee machine
microcontroller these days probably has instruction level parallelism and also
operates more efficiently on larger data words. Obviously it depends on what
you're trying to do but using this for any kind of iterative memory op is
probably slow. It's also such an oddball construction that the compiler is not
likely to be able to optimize it.

~~~
m463
Yeah, I think dma is more common now, memory mapped registers aren't as common
anymore.

------
lmilcin
I have actually used this in production for cooperative multitasking framework
for Verifone VX510.

I needed this to run multiple stuff at the same time (for example printing
while also doing network -- both serial devices on VX510).

Could have done this with setjmp/longjmp but it would not be as fun:)

------
whoisjuan
> Its discovery is credited to Tom Duff in November 1983, when Duff was
> working for Lucasfilm and used it to speed up a real-time animation program.

It's mind-boggling to me that something like that can be "discovered". It's
certainly "discovered" because it relies on a particular setup that relies on
specific low-level features of C. Perhaps I would have denoted it as
"invented" but the more I read about it, the more I realize that it probably
doesn't qualify as something that was "invented" for the same reasons.

~~~
tonyedgecombe
Didn't the whole meta-programming thing with C++ templates start because
someone noticed unusual output in an error message.

------
throwaway_pdp09
Would a modern C compiler even permit that?

The article does say " At the time of the device's invention this was the
first edition of The C Programming Language which requires only that the body
of the switch be a syntactically valid (compound) statement..." which implies
it isn't valid now.

Any correct parser based on any sane grammar should instantly choke on that.

~~~
edflsafoiewq
Yes, it's correct modern C. I'm not sure what that paragraph is supposed ot
mean. AFAICT there is no difference between old and modern C in how the cases
can appear in the body of the switch. Here is the first edition of _The C
Programming Language_ on switch:

> switch (expression) statement

>The usual arithmetic conversion is performed on the expression, but the
result must be int. The statement is typically compound. Any statement within
the statement may be labeled with one or more case prefixes as follows

> case constant_expression :

[https://archive.org/details/TheCProgrammingLanguageFirstEdit...](https://archive.org/details/TheCProgrammingLanguageFirstEdition/page/n209/mode/2up)

~~~
SV_BubbleTime
I can’t at all support this with a source, but I had THOUGHT the register
keyword needed to make this fast was largely up to how the compiler decides to
use it. That register can’t be required to be register vs ram, but it’s
encouraged to be IF the compiler can do it.

So I thought the downside was you can declare “register” but it might not obey
but then definitely can not use the & to get the memory address by using they
keyword. Which on portable code register isn’t recommended.

So I suppose it’s not entirely fair to say all modern or all past support
this. Right?

~~~
marvy
You're tangling a bunch of stuff together in a confusing mess. Let's take one
thing at a time.

1\. You're right that you can't use the & operator on a variable that was
declared using the register keyword.

2\. However, if you look very carefully at the example code, it actually
follows this rule. In fact, it doesn't use the & operator at all. There's a
lot of pointers all over the place, but no usage of the & operator.

3\. You're also right that the register keyword is just a hint that the
compiler is free to ignore.

4\. Expanding on the previous point: modern compilers usually will ignore it,
but in the olden days, compilers were not so good at register allocation, so
this keyword was needed if you wanted speeds competitive with assembly. Since
this is code from the 80s, no surprise that it uses this keyword.

5\. Now that we've got all that out of the way: the register keyword is a
complete distraction here, and has nothing at all to do Duff's device. Since
modern compilers ignore it anyway, you can take the original example and just
delete all uses of the word "register". The point of Duff's device is how to
do manual loop unrolling without then having a special "pre-loop" or "post-
loop" which takes care of "leftovers" because your array size was not
divisible by 8, or 4, or whatever unroll factor you happened to use.

------
ars
It's a very fun historical article, but it's no longer worth it on modern
CPUs, where it actually harms performance.

~~~
enriquto
It's more a comment about the syntax of the C switch statement than about
micro-optimization.

~~~
klyrs
There's two kinds of C programmers: those familiar with Duff's device, and
those who wouldn't expect Duff's device to even compile.

~~~
AdmiralAsshat
I'm familiar with it and I _still_ don't think it should compile...

~~~
burfog
I'm disappointed that gcc's nested function extension doesn't let me use a
switch to have a second entry point for the inner function.

I also can't seem to get there with a plain goto. I can't jump out either.
Probably the computed goto extension will do the job, but that is cheating.

~~~
marvy
Using computed goto would (probably) make the code compile, but it wouldn't
actually work. That's would be heading directly into undefined behavior
territory (aka land of the nasal demons).

To give just one or two examples of why this is doomed: imagine you want to
jump "inward". If the inner function takes any arguments, you haven't passed
them! And if you jump inward, and the inner function then hits a return
statement, do you return to the outer function, or to its caller?

And after you resolve all such semantic issues, you need to actually implement
all this, without slowing down code which doesn't use this feature.

And all this complexity doesn't buy you much because you have two other
options:

1\. just call the function to jump inwards, and simply return to jump out.

2\. Don't bother with nested functions, and just inline the code.

~~~
marvy
So it turns out I'm kind of wrong! My arguments for jumping "inward" still
seem pretty valid, but they don't generalize to jumping "outward". And indeed,
GCC actually allows this, but you need to use the right syntax:

[https://gcc.gnu.org/onlinedocs/gcc/Nested-
Functions.html](https://gcc.gnu.org/onlinedocs/gcc/Nested-Functions.html)

------
fred_is_fred
When I was in college people had text files of tricks like this that they
would send around. Over time you built up a library of these little hacks that
could make a real difference especially in rendering or graphics applications.

------
adamnemecek
Please don't use Duff's device. Look at how your favorite c std lib implements
memcpy and think about that rather than this.

~~~
alan
Duff's device does not do memcpy. It's writing all the elements of a string to
the same location in memory, where there's a port.

