I think you're right that jit_merge_point affects performance. My point was that this isn't an optimization of the JIT compilation, so much as it is a requirement for a JIT compiler to be able to operate. The compiler has to know what defines an execution frame. Once you've done that, there are definitely other optimizations that go in to reducing the opacity of your interpreter to the JIT compiler.
So, you're probably right that a fair amount of optimization can come from doing a good job of defining what identifies an execution frame. I'm curious though, if these are the sorts of "low hanging fruit" the author was referring to, or if they're included in the straightforward port of the original C interpreter.
From the section on "Optimizing an RPython JIT" it seems that the he's doing a lot more than just defining his execution frame.
> The first tactic is to remove as many instances of arbitrarily resizable lists as possible. The JIT can never be sure when appending an item to such a list might require a resize, and is thus forced to add (opaque) calls to internal list operations to deal with this possibility.
I guess my orginal point was that RPython has lots of hooks (eg the decorators that you mentioned) which allow the JIT to effectively trace the interpreted program. You would probably have to extend LuaJIT with similar hooks in order to use it in the same way. As tomp pointed out, the interpreter loop itself is not a good candidate for tracing. The JIT needs to at least know about the interpreters program counter to be able to identify loops in the interpreted program. You're right that referring to these as optimizations is inaccurate.