
Duff's Device: loop unrolling for interpreted languages - breily
http://www.hackszine.com/blog/archive/2008/05/duffs_device_loop_unrolling_fo.html?CMP=OTC-7G2N43923558
======
aston
This is the wrong answer in so many ways. Starting with, if you're in a
dynamic language and you're so performance sensitive that you can't take the
hit of a loop comparison, you're doing it wrong.

But...Pretending you actually need to unroll your loops, you might ask, why
did Duff choose "8" as the magic number? Was it, perhaps, looking at the code
his compiler kicked out for the loop and choosing the number that optimized
for byte copying on his architecture, given some constraint about common
buffer sizes? Yes. Lacking an "architecture" target for Javascript, it
probably doesn't make much sense to use Duff's constant just because he did.

And...if your function body is relatively slow, you probably don't need to
unroll anything in the first place. Duff had a beautiful one-liner in there
that in some situations is basically one clock cycle's worth of work.
Javascript has no such construct that couldn't be batched in the first place
(8 ++'s => one += 8). And if you make a function call, you've made your inner
loop slow.

Moral of the story: Don't do this.

~~~
xirium
> why did Duff choose "8" as the magic number?

My rule of thumb when unrolling loops in assembly was to only unroll up to and
including the bit length of the machine. So, on eight bit hardware, I would
only unroll loops of eight iterations or less.

~~~
aston
From the man himself (<http://www.lysator.liu.se/c/duffs-device.html>):

"Transformations like this can only be justified by measuring the resulting
code. Be careful when you use this thing that you don't unwind the loop so
much that you overflow your machine's instruction cache. Don't try to be
smarter than an over-clever C compiler that recognizes loops that implement
block move or block clear and compiles them into machine idioms."

~~~
xirium
> Be careful when you use this thing that you don't unwind the loop so much
> that you overflow your machine's instruction cache.

I considered this case shortly after posting. Specifically, my rule of thumb
fails miserably on the 68020 which has a very small instruction cache.

