This is a pet interest of mine for decades. Did you run any of this under perf and check where the time is going? This may be too simple (too regular) to show it, but the biggest issue with all of these is that the branch predictor doesn't have enough context to make good predictions and you end up mispredicting a lot.
This paper https://www.cs.toronto.edu/~matz/dissertation/matzDissertati... (found the reference in https://coredumped.dev/2021/10/21/building-an-emacs-lisp-vm-...) suggests an interesting option which isn't portable, but a lot less work than a full JIT: generate just call and conditional branches. This was enough for them to basically fix all the branch mispredictions. I have yet to try it, but maybe someone else will feel inspired. The pain I ran into when trying is that a naively mapp'ed region will be very far from the code you are trying to call and you won't be able to use simple instructions. The fix is to try to allocate it explicitly closer.
PS: Another technique I have used in Rust (idea was from cloudflare) is to generate closures and have each closure tail call the next. It'll end up very similar to your continuation passing, but at least on my benchmarks it was _slightly_ worse than the naive dispatch.
I would expect that the branch predictor has issues, especially on a chip that's nearly 10 years old. However, I controlled for this as best as I could. The if statements that control the VM's flow are the same for each test. The only real difference is that The switch statement has to branch to jump to the next virtual operation to take, while all others use unconditional jumps to do that.
This paper https://www.cs.toronto.edu/~matz/dissertation/matzDissertati... (found the reference in https://coredumped.dev/2021/10/21/building-an-emacs-lisp-vm-...) suggests an interesting option which isn't portable, but a lot less work than a full JIT: generate just call and conditional branches. This was enough for them to basically fix all the branch mispredictions. I have yet to try it, but maybe someone else will feel inspired. The pain I ran into when trying is that a naively mapp'ed region will be very far from the code you are trying to call and you won't be able to use simple instructions. The fix is to try to allocate it explicitly closer.
PS: Another technique I have used in Rust (idea was from cloudflare) is to generate closures and have each closure tail call the next. It'll end up very similar to your continuation passing, but at least on my benchmarks it was _slightly_ worse than the naive dispatch.