Windows decided to go the "always JIT and just cache frequent code blocks" method though. In the end whichever you choose it doesn't seem to make a big difference.
> Windows for ARM is compiled for the ARM memory model, is executed natively and runs at the near native M1 speed. There is [some] hypevisor overhead, but there is no emulation involved.
This section was referring to the emulation performance not native code performance:
"it knows basically none of the "tricks" available but the _emulation speed_ isn't much slower than the Rosetta 2 emulation ratio "
Though I'll take native apps any day I can find them :).
> Windows decided to go the "always JIT and just cache frequent code blocks" method though. In the end whichever you choose it doesn't seem to make a big difference.
AOT (or, static binary translation before the application launch) vs JIT does make a big difference. JIT always carries a pressure of the «time spent JIT'ting vs performance» tradeoff, which AOT does not. The AOT translation layer has to be fast, but it is a one-off step, thus it invariably can afford spending more time analysing the incoming x86 binary and applying more heuristics and optimisaitons yielding a faster performing native binary product as opposed to a JIT engine that has to do the same, on the fly, under tight time constraints and under a constant threat of unnecessarily screwing up CPU cache lines and TLB lookups (the worst case scenario for a freshly JIT'd instruction sequence spilling over into a new memory page).
> "it knows basically none of the "tricks" available but the _emulation speed_ isn't much slower than the Rosetta 2 emulation ratio "
I still fail to comprehend which tricks you are referring to, and I also would be very much keen to see actual figures substantiating the AOT vs JIT emulation speed statement.
I've seen mentions of a JIT path but only if the AOT path doesn't cover the use case (e.g. an x86 app with dynamic code generation) not as an optimization pass. https://support.apple.com/guide/security/rosetta-2-on-a-mac-...
Windows decided to go the "always JIT and just cache frequent code blocks" method though. In the end whichever you choose it doesn't seem to make a big difference.
> Windows for ARM is compiled for the ARM memory model, is executed natively and runs at the near native M1 speed. There is [some] hypevisor overhead, but there is no emulation involved.
This section was referring to the emulation performance not native code performance:
"it knows basically none of the "tricks" available but the _emulation speed_ isn't much slower than the Rosetta 2 emulation ratio "
Though I'll take native apps any day I can find them :).