Tom Duff's device was doing that because he's doing MMIO, you should not [I know you're not suggesting it, but just in case anybody reading thinks it's clever] do this today when you don't want MMIO, your compiler is very capable of just doing an actual copy quickly, so tell it that's what you want, don't write gymnastics like Duff's device.
However, expressing these partially unrolled loops nicely is a nice performance-not-safety feature of WUFFS called "Iterate loops":
Well, I say performance not safety, as always they want both, but you could safely just write the never unrolled case, while the existence of Iterate loops allows you to express a much faster special case but know the compiler will fix things up properly no matter what.
Aw, just needs a better compiler (with a 6502 target) :D
Jason Turner's CppCon 2021 talk, "Your New Mental Model of constexpr" has half the presentation as a C64 program (though for practical reasons not actually running on a C64 but instead an emulator) because most of the heavy lifting is done by the C++ 20 compiler. https://youtu.be/MdrfPSUtMVM?t=1422
Now, Jason's approach is not going to beat hand-crafted 6502 machine code in a fair fight but he often doesn't need to fight fair and that's the point of his talk.
However, expressing these partially unrolled loops nicely is a nice performance-not-safety feature of WUFFS called "Iterate loops":
https://github.com/google/wuffs/blob/main/doc/note/iterate-l...
Well, I say performance not safety, as always they want both, but you could safely just write the never unrolled case, while the existence of Iterate loops allows you to express a much faster special case but know the compiler will fix things up properly no matter what.