

Duff's device: do + switch fall-through = loop unrolling - JBiserkov
http://en.wikipedia.org/wiki/Duffs_device

======
biohacker42
This is the important bit:

 _When numerous instances of Duff's device were removed from the XFree86
Server in version 4.0, there was a notable improvement in performance.[3]
Therefore, when considering using this code, it may be worth running a few
benchmarks to verify that it actually is the fastest code on the target
architecture, at the target optimization level, with the target compiler._

Optimization without testing is worse then pointless. It's premature and we
know that's evil.

~~~
TimothyFitz
Or perhaps in this case it's more subtle: code becomes optimized for the wrong
thing over time. For example the difference in optimization strategies between
netburst pentium 4s and core 2 duos is staggering.

~~~
joe_the_user
Hmm,

Great point: not only does optimization cost in understanding and system
fragility but it can become stale over time and _invisibly_ so.

If you're working on the application level, testing and segregating
optimizations like this seems important. Systems level people would operate
differently though ...

------
JBiserkov
C's default fall-through in case statements has long been its most
controversial single feature; Duff observed that

"This code forms some sort of argument in that debate, but I'm not sure
whether it's for or against."

hack^2 - it's so brilliant and so ugly, even his author doesn't know which is
more.

~~~
benhoyt
Are you sure the case-label-fall-through has been its _most controversial_
feature? It's so obscure (most people haven't heard of it) and rarely-used --
I'm guessing there are other aspects of C that are more controversial, or at
the very least, more widely debated.

By the by, we've used this "feature" a fair bit in C and C++ for protothreads
in small embedded systems: see <http://blog.brush.co.nz/2008/07/protothreads/>
and <http://www.sics.se/~adam/pt/>

------
yan
It's a cute hack and I'm sure it used to matter. I saw it used in djb's qmail.
Is it worth even thinking about it? I don't know. I tend to not be a fan of
these micro-optimizations on modern hardware.

~~~
adatta02
Valid point. You'd probably have to hand inspect the generated ASM to make
sure the compiler didn't really dick something up.

Also, with the optimizations on modern chipsets it might not be immediately
obvious which implementation will actually run faster.

------
skwaddar
Duff now works at Pixar. But before that worked on <http://9fans.net/> Plan9.

Including the shell still in use today
<http://doc.cat-v.org/plan_9/4th_edition/papers/rc> best shell ever

And my favourite web browser : Mothra

<http://en.wikipedia.org/wiki/Mothra_(web_browser)>

while you argue over css / tables - try writing for a browser that has neither
!

------
ars
This is no longer useful as an optimization technique (loop unrolling makes
things slower these days), but I believe the device get use in state machines,
and libraries for making multithreading apps in a single process.

~~~
benhoyt
Yes, for "protothreads", see: <http://www.sics.se/~adam/pt/>

------
fogus
I wrote up a little walkthrough a few years back... make it would make DD a
little clearer.

<http://blog.fogus.me/2006/07/10/duff’s-device-walkthrough/>

