"Compilers are pretty bad at optimizing virtual machine interpreters, and this is unlikely to change. Here we go, straight from the horse's (or, uh, Mike Pall's) mouth"
This could possibly fall into my "strictly necessary" bucket. I'm not sure what the timing differences actually amount to, either - I've got a threaded interpreter in a fast path in some of my code, and it hasn't proved a bottleneck (less than 1% of time during a low-latency event is spent running the programs) so I've not looked closely at register allocation through it.
I reiterate that this comes at severe cost to ease of extension and refactoring, however, and shouldn't be undertaken lightly.
This could possibly fall into my "strictly necessary" bucket. I'm not sure what the timing differences actually amount to, either - I've got a threaded interpreter in a fast path in some of my code, and it hasn't proved a bottleneck (less than 1% of time during a low-latency event is spent running the programs) so I've not looked closely at register allocation through it.
I reiterate that this comes at severe cost to ease of extension and refactoring, however, and shouldn't be undertaken lightly.