
A fun optimization trick from rsync - luu
https://blog.plover.com/prog/switch-case-optimization.html
======
vortico
I don't like it. You should put the logic where the logic should be, so it's
clear to see what was intended.

    
    
        typedef int (*MethodFunc)();
    
        void doMethod() {
            static int methodId = 0;
    
            static const MethodFunc methodFuncs[] = {
        #ifdef METHOD_1_CAN_COMPILE
                method1,
        #endif
                method2,
                ...,
                NULL
            };
    
            while (methodFuncs[methodId]) {
                int success = (methodFuncs[methodId])();
                if (success)
                    return 0;
                methodId++;
            }
    
            return -1;
        }

~~~
stouset
If all of the methodFuncs fail, this will attempt to jump to whatever address
happens to be in memory after the last element of methodFuncs. You've
forgotten to stop iterating.

~~~
ahoka
There is a terminating NULL at the end of the array to stop the iteration.

~~~
jbverschoor
oh my... that never went wrong

~~~
dsamarin
A terminating null element or logical equivalent in an array is helpful in
lots of situations. As one example it simplifies the logic of looking for an
element in an array and performing some special operation if it was not found.
No bool found variable necessary.

~~~
jbverschoor
Well yes, but it can easily be removed by accident, and is easy to oversee.
See the parent comment

------
matthewaveryusa
That's not clever, that's language abuse and horrific -- You could do this in
so many other ways there is absolutely no point in abusing #includes mid-
source-code (without mentioning the code smell around statics.)

Now, as a working curiosity, this is pretty damn cool.

~~~
nwmcsween
Not really its just a static var on a switch statement nothing really bad
about it, in fact its a semi nice way to handle a simple state machine.

~~~
repsilat
Not _tooo_ uncommon either. It's a "reasonable" way to write generators in C
(if you're masochistic Python programmer, I guess.)

Switch on the last return point at the top of the function, and make all
variables that need to persist between calls `static`.

(I've written it before, and I swear I didn't invent it... Maybe it's just a
short skip from Duff's Device? Still, using it to pick up _before_ the last
return instead of after is new trick for me!)

~~~
dimatura
Yes, it's basically the idea in Duff's device. Protothreads
([http://dunkels.com/adam/pt/](http://dunkels.com/adam/pt/)) uses a pretty
similar trick.

~~~
repsilat
Ah, that's exactly the trick I was thinking of. I think I learned it from [1],
but I'm pretty sure Protothreads predates that post. Thanks for the link!

1:
[https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html](https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html)

------
usefulcat
An interesting trick, but it seems like there would be a simpler way, like
having set_the_mtime be a function pointer. Initially it would point to an
implementation that figures out which method works and sets the function
pointer to point to a particular implementation.

~~~
beefsack
Sounds really clever, but also has the potential to be really opaque and
confusing. On the other hand, the trick in the article is quite verbose.

Can be tricky balancing terseness with readability.

~~~
usefulcat
> has the potential to be really opaque and confusing

Yes, but.. compared to switch and preprocessor hacks described in the article?
Granted it's subjective but I'd find a function pointer that's assigned to
once to be way easier to comprehend.

~~~
carlmr
Yeah, I think both methods are somewhat opaque, and the one described here is
opaque + hacky. I wouldn't let this through code review.

------
alecbenzer
Awesome, that's horrifying.

One concrete issue with this technique: it doesn't "loop". Meaning, for
example, if initially method #3 doesn't work but method #4 does, it'll settle
on method #4. If later, method #3 starts working, it won't switch it. Worse,
if method #4 stops working, it'll just always fail.

I imagine this isn't an issue for rsync's use but could be a concern if
applying it more generally.

~~~
eridius
According to the article, a step isn't skipped because it had a transient
failure, it's skipped because it returns a special error code ENOSYS that
means the syscall hasn't even been implemented, and therefore there's no point
in ever trying it again.

~~~
carlmr
That's correct. Although it's a very specific use-case.

------
ThePhysicist
Andrew Tridgell’s thesis is also a very interesting read, as rsync has some
pretty innovative algorithms that make the file transfer and remote diffing
fast:

[https://rsync.samba.org/~tridge/phd_thesis.pdf](https://rsync.samba.org/~tridge/phd_thesis.pdf)

------
ecma
Thanks, I hate it. I'm a bit bemused about this bizarro case_N.h trickery. I'm
also almost positive __COUNTER__ would be adequate and entirely remove the
need for __LINE__ as suggested in the addendum which is likely to be quite
large in any non-trivial source unit. I mean, something gcc this way comes I
guess...

~~~
scatters
__COUNTER__ is not standardized. Repeated inclusion is.

------
netheril96
So thread unsafe?

~~~
romed
Note that if this were C++ (11 or later) this could be done with the static
variable in a thread-safe way, since function-scoped static variables
initialized by functions are guaranteed to be initialized exactly once. So
this code could be something like:

    
    
      int set_the_mtime(...) {
        static int starting_from = set_mtime_starting_from(0);
        set_mtime_starting_from(starting_from);
      }
      // Try various ways of setting mtime until it works.
      int set_mtime_starting_from(int starting_from) {
        // horrible switch statement
      }

~~~
repsilat
In C++ you could also make an object (per thread?) and use member vars instead
of static vars.

Also, if you're going to use static vars in multithreaded contexts doing this,
better to write

    
    
       thing = 5;
    

(via that macro magic) than `thing++`. Then at least nothing _too_ bad should
happen...

------
jbverschoor
So without actually looking into anything. What id you have a directory
structure which is built out of 2 mounts. One local fs and one network
mounted. Then lets say method 5 works on one, and method 1 works on the other,
both are the only ones that work. Then it would basically fail, depending on
the order of directories enumerated, eventhough there is a supported method.

~~~
de_watcher
I haven't looked at the code, but the ENOSYS error is more about the
capabilities of the kernel. So when we eventually start swapping entire
kernels at runtime then it'll break.

------
bacon_waffle
Edit: d'oh - I misread, the addendum is doing exactly this...

I don't understand the need for the macro-fu to generate the case values. The
use of switch_step++ in each previous case seems to drive that, but couldn't
the same be accomplished by setting switch_step to a constant?

    
    
      static int switch_step = 0;
      switch(switch_step) { // Falls through between cases
        default:
    
        #if METHOD_0_AVAILABLE
          switch_step = 0;
        case 0:
          if (method_0_works(...))
            break;
        #endif METHOD_0_AVAILABLE
    
        #if METHOD_1_AVAILABLE
          switch_step = 1;
        case 1:
        ...
      }

------
OskarS
So, slightly off topic, but one thing from C that I really miss in other
languages is subroutine-local static variables. It always annoys me in Java or
C# that you can't really make a static variable that's scoped to a single
method or function, you have to scope them to the entire class. It's common
enough that a private static variable is only needed in a single method or
function, so why not do it the way C does it?

~~~
VMG
In js you can use `this.x` or `this.methodname.x`

------
Jedd
rsync is an amazing tool, full of delightful secrets.

One thing it _doesn 't_ have natively (unless it's really well hidden) is the
ability to look at two directory structures, and generate a delta of the two
out to a separate location (say a USB stick).

A very useful feature for backups where you've got good enough (to run an
rsync --dry-run) network connectivity, but not good enough to actually do the
transfer.

~~~
lazylizard
can rsync --dry-run and | output to a file?

~~~
Jedd
Yup, that's what I ended up doing [1] - but was mildly surprised it wasn't
built in, and very surprised that I couldn't find anyone else's solution.

[1] [https://jeddi.org/b/2016/05/28/rsync-from-a-to-b-via-
usb/](https://jeddi.org/b/2016/05/28/rsync-from-a-to-b-via-usb/)

------
waingake
I'd be interested if someone could show me a FP friendly solution to this
problem.

------
toolslive
Some compilers turn large switches into lookup tables. Using side effects like
this however, makes the compiler shy away from this optimisation, and compile
it literally. So it's probably not a good idea anymore.

~~~
ygra
I'm not sure switch performance is a relevant concern for setting the last
modification time of a file.

