
Single-use JIT Performance on x86 Processors - Nyan
https://github.com/animetosho/jit_smc_test/blob/master/README.md
======
andikleen2
You have to be a bit careful with the CLFLUSH method. I tried to use it in a
widely used program years ago because Intel recommended it, but we found that
it just hangs the CPU on some older VIA/Centaur CPUs. Presumably that's fixed
these days, but the old CPUs are likely still around.

~~~
Nyan
Thanks for the info! Unfortunately I don't have access to any VIA/Centaur
CPUs, so couldn't test on those (though test results welcome if anyone is
willing/able to!).

But yeah, you have to check the CPU you're running on when doing these tricks
unfortunately, as results vary greatly across micro-architectures.

Interestingly, there's a more optimal CLFLUSHOPT instruction on more recent
processors, which often seems to be quite effective for this task.

------
MaxBarraclough
Looks fun, but impractical. Are there real uses for this kind of thing, on
modern architectures?

Related question: has anyone tried to create a high-level language for doing
this kind of madness?

~~~
vardump
There are practical applications for this.

Perhaps most importantly for regexp compilation, as it is a fairly common use
case [1].

Some other examples I can think of: JIT compiling an eval-statement.
Constructing a filter for example for a database scan. CPU based pixel shader.
Etc.

A lot of code is JIT compiled, but executed only once. So this was a pretty
interesting article with practical performance implications in some scenarios.

[1]: See for example
[https://www.pcre.org/original/doc/html/pcrejit.html](https://www.pcre.org/original/doc/html/pcrejit.html)

~~~
Nyan
Thanks for the info. I'm not particularly familiar with common JIT
applications, but I suspect that this use-case is actually more niche than may
think.

The problem is that the example presented requires a memory page with write +
execute permissions (at the same time). I suspect many JITs don't do this for
security reasons (and to deal with OSes which don't allow it), as it may make
it easier for an attacker to gain arbitrary code execution. It's likely that
many JITs toggle between write and execute permissions, rather than have both
enabled at the same time. Whilst this reduces attack surface, changing
permissions on allocated memory requires syscalls, which are quite expensive
in terms of performance.

The scenario presented in the article avoids the impact of syscalls, to
maximize performance, leaving only the impact caused by the processor itself.
If a JIT isn't overly concerned with this type of security, using
write+execute memory could be a way to avoid syscall overhead. On the other
hand, if a JIT does toggle permissions, the syscall overhead is likely much
more significant than overheads caused by the processor (although the
techniques shown might still help depending on how the JIT engine works).

~~~
vardump
Yeah, the security implications are obvious, R+W+X should not be used with
untrusted inputs.

Not that I'd recommend this, but alternatively you could also map exact same
memory twice, one with R+X and the other with R+W. The attacker would need to
figure out the writable address. Unfortunately there are probably a lot of
ways to accidentally leak this information to the attacker...

There are still plenty of use cases where inputs can be trusted.

~~~
saagarjha
For performance I think this scheme is fairly common for W^X JITs.

