One (non-generalizable) solution is to avoid using writable function pointers, t...

chocolatebunny · on Nov 14, 2018

Is there a reason why you did that that way other than to avoid writeable function pointers?

Also, that just sounds like how switch/case is sometimes implemented. Have you considered just using a switch/case statement instead of manually managing function pointers and the like?

derf_ · on Nov 15, 2018

That was the main reason.

It also means that you don't need to pass around (a pointer to) a big table of function pointers throughout all of your functions that need to use accelerated routines. Instead you just pass a single integer index. But that is pretty minor.

It can be pretty similar to a switch in practice, but I think there are a few differences. One is function argument handling. Each version of an accelerated routine has to be an independent function, compiled in a separate compilation unit, because you have to limit the instruction sets available. So you will duplicate all of the function call setup overhead in each branch of the switch. This probably gets optimized away by a decent compiler. Even if it does, you still wind up essentially duplicating the function table at every call site. That does not seem likely to get optimized away. You could dispatch to a common thunk which selects the accelerated routine to call, but now you have two function calls instead of one. I am also not sure that switch statements handle the default case as cheaply as doing a single bitwise AND with a small, compile-time constant.

But if you have a use of function pointers that can be replaced by a simple switch statement, that is probably the better approach.

pjmlp · on Nov 15, 2018

Your approach is kind of how indirect function calls are implemented in WebAssembly, if I am not mistaken.

tedunangst · on Nov 15, 2018

Using a switch requires that you statically determine and list every target. Implementing your own jump table let's you build it at runtime.

tedunangst · on Nov 15, 2018

Even in the dynamic case, you can still use an possibly malloced array and verify the index is valid, which is a nice improvement over leaving function pointers lying around in the vicinity of buffers.