Hacker News new | past | comments | ask | show | jobs | submit login
Function Dispatch Tables in C (2019) (alicegoldfuss.com)
35 points by blacksqr 3 days ago | hide | past | favorite | 50 comments





There is a vague reference to performance and runtime efficiency, as if function pointers will magically help. The syscalls table is said to be function pointers for performance reasons?? I feel the author does not understand this. The syscall ABI needs it because it's not a direct call, it's a generic mechanism to index into a list of functions whose implementations are completely abstract to the caller...

I guarantee you for these short examples in this article, using function pointers will slow everything down. The function pointer approach destroys locality, is bad for speculative execution, increases code size... A good optimizer would detect this and replace the indirect call with inlining. It's rare that you really need dynamic dispatch. There is a high performance cost for it.

The times when you need dynamic dispatch are those which you can't make a direct call, the callee is not known in advance. So things like generic callback mechanisms.


Whats the performance cost exactly?

Modern CPUs with pipelining, branch prediction, speculative execution, caching do best with fewer jumps, small code size, predictable jump targets, sequential access. A tight loop of "jump to the address i just loaded from this 64 bit quantity" throws a total wrench in the middle of it and will have those mechanisms stall.

Imagine a sorting algorithm with a jump into a callback to compare the elements. Then compare with a simple integer compare (1-2 instructions) inlined. In the former, you are jumping all over town, to jump targets loaded from RAM (not predictable). In the latter, one very small integer compare instruction, inline with the larger algorithm.

See also: qsort in c vs std::sort in c++


I think you are constructing a straw man argument. The jump-table approach is just as likely to be a candidate for the optimization you claim will get in the way, as the code using if/switch statements. In fact, the compiler can see that simple call through the jump table and optimize it even better - maybe even by inlining.

The question is, how are you sure this isn't happening? The answer is, you're not - unless you examine the code your compiler is emitting, such claims are specious.


> In fact, the compiler can see that simple call through the jump table and optimize it even better - maybe even by inlining.

Sure, if the table is `static const`. Otherwise the compiler generates code to load the table address, add the offset, retrieve the address and then jump to it.

If the table is defined in the current translation unit[1], or if it is visible but not const[2], or if it is visible but not static[3][4], the compiler cannot safely inline anything.

If the table is: a) Defined in the current function b) Does not escape the scope of the current function c) Is qualified with static d) Is qualified with const

Then sure, the compiler can figure out the best optimisation. Luckily, for benchmark purposes all of the above will be true so your benchmark program will show a performance benefit to the jump table.

Real programs with jump tables are never able to satisfy all those conditions.

[1] If defined elsewhere, the compiler doesn't know the contents of the table and so has generate code to look it up every time.

[2] If it isn't const, the compiler cannot assume that the entries in the table don't change, and thus has generate code to look it up every time.

[3] If the table is visible outside of the current scope or translation unit, the compiler cannot assume that the entries don't change, and once again has to generate code to perform the lookup every time.

[4] If the table is defined in the current function but not static, the compiler will have to generate code to populate that table every single time the function is entered.


That's exactly why you're supposed to examine the compiler output, and doing this will tell you that function pointer jump tables are often not optimized that way. There's a good reason why emulators / virtual machines use switch-case or computed-goto instead of function pointer jump tables for opcode dispatching.

Even examining the compiler output is not enough: it may well be that code A is faster than code B in some hardware environments/workloads and slower in other, and the only way to know for sure which is more performant in your scenario is to actually measure them both in your actual scenario.

I did the leg-work. To me, the call_table() approach looks faster - no cmp/jumps to pass through every time, and simpler code for debugging.

--- C-code "exer.c" file containing the different techniques:

    #include <stdio.h>
    
    int add(int first, int second);
    int sub(int first, int second);
    int mult(int first, int second);
    int divide(int first, int second);
    
    typedef int math_function(int first, int second);
    
    math_function *my_array[4] = {
            add,
            sub,
            mult,
            divide
        };
        
    int add(int first, int second){
            return first + second;
        }
        
    int sub(int first, int second){
            return first - second;
        }
        
    int mult(int first, int second){
            return first * second;
        }
        
    int divide(int first, int second){
            return first / second;
        }
        
    int inline_with_switch() {
        
       int first, second, choice, result;
       
       first = 2;
          second = 3;
          choice = 1;
          
        switch(choice) {
                    case 0 :
                        result = first + second;
                            break;
                        case 1 :
                        result = first - second;
                            break;
                        case 2 :
                        result = first * second;
                            break;
                        case 3 :
                        result = first / second;
                            break;
                    }
                    
       printf("Result is %d\n", result);
        
        return 0;
    }
       
    int functions_with_switch() {
        
       int first, second, choice, result;
       
       first = 2;
          second = 3;
          choice = 1;
          
    
        switch(choice) {
                    case 0 :
                        result = add(first, second);
                            break;
                        case 1 :
                        result = sub(first, second);
                            break;
                        case 2 :
                        result = mult(first, second);
                            break;
                        case 3 :
                        result = divide(first, second);
                            break;
                    }
                    
       printf("Result is %d\n", result);
        
        return 0;
    }
       
    
    int call_table() {
        
        int first, second, choice, result;
               
               first = 2;
            second = 3;
            choice = 1;
            
        math_function *my_array[4] = {
                    add,
                    sub,
                    mult,
                    divide
                };
                
        result = my_array[choice](first, second);
        
        printf("Result is %d\n", result);
         
         return 0;
     }
        
    
    void main(int argc, char *argv[])
    {
          call_table();
              inline_with_switch();
              functions_with_switch();
    }

            
    
--- Assembly Code (produced with gcc -S exer.c -o exer.s):

      .file "exer.c"
      .text
      .def printf; .scl 3; .type 32; .endef
      .seh_proc printf
      printf:
     pushq %rbp
      .seh_pushreg %rbp
      pushq %rbx
      .seh_pushreg %rbx
      subq $56, %rsp
      .seh_stackalloc 56
      leaq 48(%rsp), %rbp
      .seh_setframe %rbp, 48
      .seh_endprologue
      movq %rcx, 32(%rbp)
      movq %rdx, 40(%rbp)
      movq %r8, 48(%rbp)
      movq %r9, 56(%rbp)
      leaq 40(%rbp), %rax
      movq %rax, -16(%rbp)
      movq -16(%rbp), %rbx
      movl $1, %ecx
      movq __imp___acrt_iob_func(%rip), %rax
      call *%rax
      movq %rbx, %r8
      movq 32(%rbp), %rdx
      movq %rax, %rcx
      call __mingw_vfprintf
      movl %eax, -4(%rbp)
      movl -4(%rbp), %eax
      addq $56, %rsp
      popq %rbx
      popq %rbp
      ret
      .seh_endproc
      .globl my_array
      .data
      .align 32
      my_array:
     .quad add
      .quad sub
      .quad mult
      .quad divide
      .text
      .globl add
      .def add; .scl 2; .type 32; .endef
      .seh_proc add
      add:
     pushq %rbp
      .seh_pushreg %rbp
      movq %rsp, %rbp
      .seh_setframe %rbp, 0
      .seh_endprologue
      movl %ecx, 16(%rbp)
      movl %edx, 24(%rbp)
      movl 16(%rbp), %edx
      movl 24(%rbp), %eax
      addl %edx, %eax
      popq %rbp
      ret
      .seh_endproc
      .globl sub
      .def sub; .scl 2; .type 32; .endef
      .seh_proc sub
      sub:
     pushq %rbp
      .seh_pushreg %rbp
      movq %rsp, %rbp
      .seh_setframe %rbp, 0
      .seh_endprologue
      movl %ecx, 16(%rbp)
      movl %edx, 24(%rbp)
      movl 16(%rbp), %eax
      subl 24(%rbp), %eax
      popq %rbp
      ret
      .seh_endproc
      .globl mult
      .def mult; .scl 2; .type 32; .endef
      .seh_proc mult
      mult:
     pushq %rbp
      .seh_pushreg %rbp
      movq %rsp, %rbp
      .seh_setframe %rbp, 0
      .seh_endprologue
      movl %ecx, 16(%rbp)
      movl %edx, 24(%rbp)
      movl 16(%rbp), %eax
      imull 24(%rbp), %eax
      popq %rbp
      ret
      .seh_endproc
      .globl divide
      .def divide; .scl 2; .type 32; .endef
      .seh_proc divide
      divide:
     pushq %rbp
      .seh_pushreg %rbp
      movq %rsp, %rbp
      .seh_setframe %rbp, 0
      .seh_endprologue
      movl %ecx, 16(%rbp)
      movl %edx, 24(%rbp)
      movl 16(%rbp), %eax
      cltd
      idivl 24(%rbp)
      popq %rbp
      ret
      .seh_endproc
      .section .rdata,"dr"
      .LC0:
     .ascii "Result is %d\12\0"
      .text
      .globl inline_with_switch
      .def inline_with_switch; .scl 2; .type 32; .endef
      .seh_proc inline_with_switch
      inline_with_switch:                             <------- inline_with_switch()
     pushq %rbp
      .seh_pushreg %rbp
      movq %rsp, %rbp
      .seh_setframe %rbp, 0
      subq $48, %rsp
      .seh_stackalloc 48
      .seh_endprologue
      movl $2, -8(%rbp)
      movl $3, -12(%rbp)
      movl $1, -16(%rbp)
      cmpl $3, -16(%rbp)
      je .L12
      cmpl $3, -16(%rbp)
      jg .L13
      cmpl $2, -16(%rbp)
      je .L14
      cmpl $2, -16(%rbp)
      jg .L13
      cmpl $0, -16(%rbp)
      je .L15
      cmpl $1, -16(%rbp)
      je .L16
      jmp .L13
      .L15:
     movl -8(%rbp), %edx
      movl -12(%rbp), %eax
      addl %edx, %eax
      movl %eax, -4(%rbp)
      jmp .L13
      .L16:
     movl -8(%rbp), %eax
      subl -12(%rbp), %eax
      movl %eax, -4(%rbp)
      jmp .L13
      .L14:
     movl -8(%rbp), %eax
      imull -12(%rbp), %eax
      movl %eax, -4(%rbp)
      jmp .L13
      .L12:
     movl -8(%rbp), %eax
      cltd
      idivl -12(%rbp)
      movl %eax, -4(%rbp)
      nop
      .L13:
     movl -4(%rbp), %eax
      movl %eax, %edx
      leaq .LC0(%rip), %rax
      movq %rax, %rcx
      call printf
      movl $0, %eax
      addq $48, %rsp
      popq %rbp
      ret
      .seh_endproc
      .globl functions_with_switch
      .def functions_with_switch; .scl 2; .type 32; .endef
      .seh_proc functions_with_switch
      functions_with_switch:                             <------- functions_with_switch()
     pushq %rbp
      .seh_pushreg %rbp
      movq %rsp, %rbp
      .seh_setframe %rbp, 0
      subq $48, %rsp
      .seh_stackalloc 48
      .seh_endprologue
      movl $2, -8(%rbp)
      movl $3, -12(%rbp)
      movl $1, -16(%rbp)
      cmpl $3, -16(%rbp)
      je .L19
      cmpl $3, -16(%rbp)
      jg .L20
      cmpl $2, -16(%rbp)
      je .L21
      cmpl $2, -16(%rbp)
      jg .L20
      cmpl $0, -16(%rbp)
      je .L22
      cmpl $1, -16(%rbp)
      je .L23
      jmp .L20
      .L22:
     movl -12(%rbp), %edx
      movl -8(%rbp), %eax
      movl %eax, %ecx
      call add
      movl %eax, -4(%rbp)
      jmp .L20
      .L23:
     movl -12(%rbp), %edx
      movl -8(%rbp), %eax
      movl %eax, %ecx
      call sub
      movl %eax, -4(%rbp)
      jmp .L20
      .L21:
     movl -12(%rbp), %edx
      movl -8(%rbp), %eax
      movl %eax, %ecx
      call mult
      movl %eax, -4(%rbp)
      jmp .L20
      .L19:
     movl -12(%rbp), %edx
      movl -8(%rbp), %eax
      movl %eax, %ecx
      call divide
      movl %eax, -4(%rbp)
      nop
      .L20:
     movl -4(%rbp), %eax
      movl %eax, %edx
      leaq .LC0(%rip), %rax
      movq %rax, %rcx
      call printf
      movl $0, %eax
      addq $48, %rsp
      popq %rbp
      ret
      .seh_endproc
      .globl call_table
      .def call_table; .scl 2; .type 32; .endef
      .seh_proc call_table
      call_table:                             <------- call_table()
     pushq %rbp
      .seh_pushreg %rbp
      movq %rsp, %rbp
      .seh_setframe %rbp, 0
      subq $80, %rsp
      .seh_stackalloc 80
      .seh_endprologue
      movl $2, -4(%rbp)
      movl $3, -8(%rbp)
      movl $1, -12(%rbp)
      leaq add(%rip), %rax
      movq %rax, -48(%rbp)
      leaq sub(%rip), %rax
      movq %rax, -40(%rbp)
      leaq mult(%rip), %rax
      movq %rax, -32(%rbp)
      leaq divide(%rip), %rax
      movq %rax, -24(%rbp)
      movl -12(%rbp), %eax
      cltq
      movq -48(%rbp,%rax,8), %r8
      movl -8(%rbp), %edx
      movl -4(%rbp), %eax
      movl %eax, %ecx
      call *%r8
      movl %eax, -16(%rbp)
      movl -16(%rbp), %eax
      movl %eax, %edx
      leaq .LC0(%rip), %rax
      movq %rax, %rcx
      call printf
      movl $0, %eax
      addq $80, %rsp
      popq %rbp
      ret
      .seh_endproc
      .def __main; .scl 2; .type 32; .endef
      .globl main
      .def main; .scl 2; .type 32; .endef
      .seh_proc main
      main:
     pushq %rbp
      .seh_pushreg %rbp
      movq %rsp, %rbp
      .seh_setframe %rbp, 0
      subq $32, %rsp
      .seh_stackalloc 32
      .seh_endprologue
      movl %ecx, 16(%rbp)
      movq %rdx, 24(%rbp)
      call __main
      call call_table
      call inline_with_switch
      call functions_with_switch
      nop
      addq $32, %rsp
      popq %rbp
      ret
      .seh_endproc
      .ident "GCC: (MinGW-W64 x86_64-posix-seh, built by Brecht Sanders) 11.2.0"
      .def __mingw_vfprintf; .scl 2; .type 32; .endef

Well, which is actually faster? Also, try benchmarking while there is a lot of context switching goes on in background (I vaguely recall a story about how some non-optimal looking code actually fared better when the processor constantly kept flushing its caches/buffers but can't remember the exact details).

I measured it with a simple modification of the above program to call each method() 5,000,000 times and sample the time taken.

With minimal optimizations (-O), the call_table is moderately slower. This is probably because with gcc's default optimizer, it doesn't recognize the inline-ability of call_table(), whereas it does with the other method()'s.

With all optimizations on (-O3): all methods are, performance-wise, equivalent. The call_table() gets inlined, like the competition, and it performs just as well.

But thats the actual point: the call_table() method is as equally qualified for optimization as other methods - and in the end, produces the same performance. So really, its a matter of style and readability - which the call_table() wins over gigantic switch() statements, easily.


In a switch you see the index as a number in the case statement, above a small number, finding the value of choice is going to be unreadable, no? You have to count the position of the function you want?

Small number? I think you mean properly documented enum. ;)

But yeah, point taken. Its a style thing. I've just gotten allergic to wading through multiple-1000 lines of switch/case statements while trying to keep the state machine 'intentions' in my head, comparing with 'actuality' in the debugger.

I much prefer the smaller lines-of-code approach, even if it has been pointed out to me now by others in this thread that switch() is faster - maintenance is a performance index, also... especially when going back to code that was written some time >1year ago, etc.


Predicting through jump tables is no small part of the purpose of indirect branch prediction. This impacts two critical abilities here: 1. Good local decisions from the compiler optimizer; 2. Good execution by superscalar microarchitectures.

This discussion runs the risk of being as dated as advice to write macros rather than functions due to insufficiently aggressive inclining. However, as it stands today anecdotally, I've never seen compiler optimization inline through a indirect call in a jump table, and I have seen numerous examples of inlining through switch. I've also seen switch compile down to computed goto. Santana in his analysis of Indirect Branch Speculation indicates that the execution of these branches is less refined relative to direct branches.

Switch statements are a pretty good default.


I wrote this:

> A good optimizer would detect this and replace the indirect call with inlining

And it seems like it's true for your experiment. These days, this is a common optimization, provided the optimizer can figure it out. A link time optimizer can even do it between complication units. I remember when it wasn't possible or common.


Uh, this is a few years old, and perhaps the author was learning as they went. Still, I think this passage is a bit too much:

Quick refresher: a pointer is a location in memory aka a memory address. That memory address can contain anything: an integer, a float, the middle of a string. It can also store the name of a function, also known as its label. A function’s name is its address in memory.

A pointer is not a location in memory. A pointer is a type of value, that represents a location in memory. It certainly cannot hold the name of a function, since names are not locations (also, in C there are no names at run-time typically).

It can hold the location of a function, which is represented symbolically in C by its name. These names are not called labels in C, that is a common term in assembly though.


If you come at it from an assembly perspective, the explanation makes a lot of sense IMHO. But it could use a qualifier that the "name" is just a number and not the actual name as in the source code.

> Having 100 cases means we could potentially check 100 cases before selecting one. case statements are O(n), where n is the number of case/switch statements possible.

I was under the impression that most (all?) C compilers would optimize this use-case into a jump-table, turning this case statement usage into an O(1) operation. Or is this a C++ feature that I'm thinking of?


Quite a lot of effort goes into compiling switch statements. Jump tables and binary search are likely, as is a mixture of the two based on distribution of cases.

I haven't seen computed goto but wouldn't be surprised if some compilers have that as a third option.


If you want fun see what a C compiler does to your switch statement with -Os

My opinion is the big advantage of dispatch tables in C is decoupling modules from each other.

Tip: Taking things further: variadic function pointers.


For readability, this is reasonable. There may be a cost that it weakens some static analyzers, but I don't program in C enough to be sure.

For performance, this is not necessary because the compiler can optimize long switch-case flow into dispatch tables: https://godbolt.org/z/7WxEfc6YM


> For readability, this is reasonable.

With four branches, maybe. With more branches, having array of tens of function pointers makes hard to tell which index maps to which function call. OTOH, that can somehow be mitigated (at least in C99) with designated initialisers.


Definitely can be useful, but keep in mind that performance will usually be worse for function tables.

Why?

And, if this is the case, why then do compilers turn switch statements into function tables?


A jump table made of function pointers has more runtime overhead than a switch-case jump table because the latter directly jumps into machine code snippets within the same function, and those snippets don't have the function prologues/epilogues. And function pointers are also often an "optimization barrier" where the compiler can't inline to get rid of the epilogue and prologue.

Yup, though I do think in some cases if the indexes into the function table are known a sufficiently smart compiler can inline it anyway, even if the linkage isn't static: https://godbolt.org/z/6cY7zxT9W

While compilers do turn switch statements into jump tables, in this case storing function pointers and calling them you add the additional call overhead of saving registers, setting up the stack, etc in the function prolog and epilog.

That call overhead is there in then code using "if-else" logic to determine which functions to call, also, though.

So I still don't see your claim as being accurate.


My comparison is vis-a-vis an array of function pointers as described in the article vs a compiler-generated switch jump table with inlined bodies that the compiler will always generate with a switch statement, rather than in comparison to an if-else tree: there is no function being generated, so there is no function prolog or epilog.

...not quite: the if-else (or switch-case) can usually inline the called function, which gets rid of the epilogue/prologue and opens up more optimization opportunities.

The call-table functions also get inlined, so the advantage is shared by both approaches, and yet the call-table produces more efficient code (no cmp/jump traps to fall into...)

See my comment here for the C code and Assembly that demonstrates this:

https://news.ycombinator.com/item?id=31834241


I can't reproduce neither clang nor gcc with -O3 inlining functions in a call table: https://godbolt.org/z/E7Tj31vox

Nor do I see any of that kind of behavior in your linked assembly - call_table clearly contains "call *%r8", which is an indirect call to a function, definitely not inlined.

Here's a more complete test: https://godbolt.org/z/dT45aKTe1 - putting the operation in a loop (to be able to more clearly see the per-iteration cost of the operation), with -O3, both clang & gcc. While gcc decides to do comparisons (it wants at least 5 cases for a jump table apparently), clang does do the indirect jump, while neither does anything other than a call (thus suffering said register spilling & stack manipulation overhead) for the explicit function jump table.


And, some actual timings (code: https://godbolt.org/z/qnWEqs446):

    $ clang -DLEN=1024 -DITERATIONS=100000 -O3 switch.c && ./a.out
    predictable:
      call_table:   173.1ms
      switch_cases: 59.6ms
    random:
      call_table:   536.2ms
      switch_cases: 450.4ms
    $ gcc -DLEN=1024 -DITERATIONS=100000 -O3 -fcf-protection=none switch.c && ./a.out
    predictable:
      call_table:   174.5ms
      switch_cases: 114.7ms
    random:
      call_table:   581.8ms
      switch_cases: 108.8ms
    $ clang -DLEN=64 -DITERATIONS=2000000 -O3 switch.c && ./a.out
    predictable:
      call_table:   230.6ms
      switch_cases: 116.0ms
    random:
      call_table:   218.7ms
      switch_cases: 158.2ms
    $ gcc -DLEN=64 -DITERATIONS=2000000 -O3 -fcf-protection=none switch.c && ./a.out
    predictable:
      call_table:   872.7ms
      switch_cases: 146.3ms
    random:
      call_table:   740.4ms
      switch_cases: 132.2ms

The switch is faster in all cases, often by a big margin.

I'm using gcc under Ubuntu/WSL on Windows, so it may be some compiler thing.

But nevertheless, thank you for taking the time to follow up on this, because it is a very interesting subject to me personally and a lot has been learned through efforts such as yours, in this thread.


Windows/WSL shouldn't change anything, gcc will have the same set of optimizations everywhere. There's just no inlining of the functions going on. There may be differences in the dispatching between different compilers & optimization levels, but none inline functions from a function list.

It'd be a pretty messy optimization, as it'd need to check whether all functions can be inlined instead of just any single one (and increase the requirements if the call otherwise would be a tail-call, which cost a decent bit less), and it'd make it impossible to predictably do an actual function jump table if the compiler sometimes just didn't.


I prefer the flexibility of computed gotos a GCC extension that is likely the fastest dispatch method outside of compile-time. With computed gotos, you can have arbitrary dispatch complexity and structure, interleave code(e.g./code1/ ;skip_label: ;label1: /code2/; goto *return_loc;) and use assembler-like tricks to reduce fast paths(e.g. switch off code segment execution or alter control flow on the fly)

> case statements are O(n), where n is the number of case/switch statements possible.

What now? Not even close. Any modern or even semi-modern (2005-vintage) compiler will compile large switches into a jump table (O(1)) and medium ones into a tree of branches, giving O(log n)


I was expecting some form of benchmark to see if claims about better performance is true.

Expecting jump tables to have higher performance than the alternatives sounds definitely iffy, this also reads as if the author doesn't know that switch-case statements also just use a jump table under the hood if the case-blocks are continuous.

Regarding performance, if the CPU branch predictor works well the jump indirection overhead might disappear completely, but that's still not as good as if the compiler can inline the destination function, and jump tables usually prevent that.

(a switch-case dispatcher might actually be better than a traditional function pointer jump table, because the switch-case eliminates some function entry/exit "ceremony", also see "computed goto")


Function pointer jumps definitely break most chances for a compiler to optimize a function call. But then, so does a switch statement, most of the time. (Not to mention switch statements are often transformed into jump tables anyways.)

Humans tend to do a lot better jump tables in assembler, because of better choices about register usage etc. can be made, less (or no) need to spill to stack. One of the few remaining compiler weaknesses.


I'm also dubious on the claim the switch statement is O(n). It might be in a pathological worst case, but you can pretty much bet the compiler is going to transform it into a jump table or other optimized execution (maybe a computed jump). Especially when the cases are contiguous like this...

I agree that a benchmark is warranted, or at least a comparison of the generated assembly (at different optimization levels).


Also, O(n) doesn’t mean much when the CPU can execute hundreds of checks in a few tens of cycles.

Not for jumps - modern CPUs still (usually) have limit of one taken branch per cycle, or 1 to 2 untaken branches. And usually branch prediction will be the limiting factor anyway, for which a single branch is gonna be faster than many.

This is also how virtual methods are implemented by C++ compilers.

As others say I'm fairly sure switch/case creates a jump table under the hood, and it also checks bounds.

Personally, I don't think a static dispatch table like this is a good example. It's really more useful where the dispatch table might be dynamic; for example if a plugin could add more maths functions.


> ...and it also checks bounds.

If you want to get rid of the switch-case bounds check, add a default case with __builtin_unreachable() (or on MSVC: __assume(0)), this hints the optimizer to not create a bounds check before the jump table access. At your own risk of course ;)


There sure are lots of "fairly sure", "I feel" and "I think" comments in this thread.

I'm inclined to accept that Alice Goldfuss knows what theyre talking about, partly because their twitter is full of good tech stuff, and partly because I too have encountered situations where switching from a big branchy case statement to function pointers improved performance significantly.


That's because there's no right or wrong answer about this, and people on here are not just a bunch of juniors who should automatically accept as an authority anyone who blogs.

If switch/case actually does compare each and every item, as the article claims, then obviously a dispatch table is a solid choice to improve performance.

But that is just not likely the case in this example (and it does depend on a number of factors), and so it's then largely a matter of taste. This technique definitely has it's place, but this example is not the best, um, case, IMHO.


A compiler can generate a jump table. Whether it will is another matter. For a small number of cases or values with large numeric gaps you will most likely get a chain of conditionals.

IME the popular compilers (clang, gcc, msvc) are really aggressive about generating a jump table though, e.g. if you have continuous ranges with gaps between them, it will create a jump table for each range, not simply fall back to a dumb sequence of tests.

Addendum, if choice is directly or indirectly decided by user input that code as is allows for array out of bounds.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: