Hacker News new | past | comments | ask | show | jobs | submit login
Duff's device: do + switch fall-through = loop unrolling (wikipedia.org)
16 points by JBiserkov on Aug 13, 2009 | hide | past | favorite | 11 comments



This is the important bit:

When numerous instances of Duff's device were removed from the XFree86 Server in version 4.0, there was a notable improvement in performance.[3] Therefore, when considering using this code, it may be worth running a few benchmarks to verify that it actually is the fastest code on the target architecture, at the target optimization level, with the target compiler.

Optimization without testing is worse then pointless. It's premature and we know that's evil.


Or perhaps in this case it's more subtle: code becomes optimized for the wrong thing over time. For example the difference in optimization strategies between netburst pentium 4s and core 2 duos is staggering.


Hmm,

Great point: not only does optimization cost in understanding and system fragility but it can become stale over time and invisibly so.

If you're working on the application level, testing and segregating optimizations like this seems important. Systems level people would operate differently though ...


C's default fall-through in case statements has long been its most controversial single feature; Duff observed that

"This code forms some sort of argument in that debate, but I'm not sure whether it's for or against."

hack^2 - it's so brilliant and so ugly, even his author doesn't know which is more.


Are you sure the case-label-fall-through has been its most controversial feature? It's so obscure (most people haven't heard of it) and rarely-used -- I'm guessing there are other aspects of C that are more controversial, or at the very least, more widely debated.

By the by, we've used this "feature" a fair bit in C and C++ for protothreads in small embedded systems: see http://blog.brush.co.nz/2008/07/protothreads/ and http://www.sics.se/~adam/pt/


It's a cute hack and I'm sure it used to matter. I saw it used in djb's qmail. Is it worth even thinking about it? I don't know. I tend to not be a fan of these micro-optimizations on modern hardware.


Valid point. You'd probably have to hand inspect the generated ASM to make sure the compiler didn't really dick something up.

Also, with the optimizations on modern chipsets it might not be immediately obvious which implementation will actually run faster.


Duff now works at Pixar. But before that worked on http://9fans.net/ Plan9.

Including the shell still in use today http://doc.cat-v.org/plan_9/4th_edition/papers/rc best shell ever

And my favourite web browser : Mothra

http://en.wikipedia.org/wiki/Mothra_(web_browser)

while you argue over css / tables - try writing for a browser that has neither !


This is no longer useful as an optimization technique (loop unrolling makes things slower these days), but I believe the device get use in state machines, and libraries for making multithreading apps in a single process.


Yes, for "protothreads", see: http://www.sics.se/~adam/pt/


I wrote up a little walkthrough a few years back... make it would make DD a little clearer.

http://blog.fogus.me/2006/07/10/duff’s-device-walkthrough/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: