I don't think that can_enter_jit and jit_merge_point are hints for optimization. I'm not particularly familiar with JIT compilation, or RPython, but it looks like RPython provides other methods for doing trace optimization.
For example, there is a decorator `@purefunction` which hints to the translator that a function will always return the same output given the same input, even if it is operating on a data structure that could be considered "opaque".
I'm not familiar either, I'm just going off the article.
"One thing the trace optimiser knows is that because program_counter is passed to jit_merge_point as a way of identifying the current position in the user's end program, any calculations based on it must be constant. These are thus easily optimised away, leaving the trace looking as in Figure 3."
I think you're right that jit_merge_point affects performance. My point was that this isn't an optimization of the JIT compilation, so much as it is a requirement for a JIT compiler to be able to operate. The compiler has to know what defines an execution frame. Once you've done that, there are definitely other optimizations that go in to reducing the opacity of your interpreter to the JIT compiler.
So, you're probably right that a fair amount of optimization can come from doing a good job of defining what identifies an execution frame. I'm curious though, if these are the sorts of "low hanging fruit" the author was referring to, or if they're included in the straightforward port of the original C interpreter.
From the section on "Optimizing an RPython JIT" it seems that the he's doing a lot more than just defining his execution frame.
> The first tactic is to remove as many instances of arbitrarily resizable lists as possible. The JIT can never be sure when appending an item to such a list might require a resize, and is thus forced to add (opaque) calls to internal list operations to deal with this possibility.
I guess my orginal point was that RPython has lots of hooks (eg the decorators that you mentioned) which allow the JIT to effectively trace the interpreted program. You would probably have to extend LuaJIT with similar hooks in order to use it in the same way. As tomp pointed out, the interpreter loop itself is not a good candidate for tracing. The JIT needs to at least know about the interpreters program counter to be able to identify loops in the interpreted program. You're right that referring to these as optimizations is inaccurate.
jit_merge_point isn't an optimization. It tells the PyPy JIT how to separate the parts of the trace that are part of your interpreter loop and the parts of the trace that are implementation of each bytecode.