Everyone here seems to be making this solely about the processor, which it is partially, but the other story is that Rosetta 2 is looking really good. Obviously there’s plenty more to test beyond a GeekBench benchmark, but hitting nearly 80% of native performance shows both the benefits of ahead-of-time transpiling and the sophistication of Apple's implementation.
I’ll still be waiting to see what latency-sensitive performance is like (specifically audio plugins) but this is halfway to addressing my biggest concern about buying an M-series Mac while the software market is still finding its footing.
As some other have mentioned, Rosetta uses AOT translation that contrasts with Microsoft's approach of emulation.
I think the contrast in approach here is motive, where I get the impression that Microsoft wanted developers to write ARM apps and publish them to its Store by making emulated programs less attractive by the virtue of poorer performance.
Apple on the other hand is keen to to get rid of Intel as quickly as they can, therefore they had to make the transition as seamless as possible.
Microsoft calls their approach emulation, but if you read the details you see they are also translating instructions and caching the result (AOT translating).
Their current implementation only supports 32-bit code, x64 translation is still underway. It is not known how well x64 translated code will perform relative to native code.
I am similarly curious about Rosetta2, but there seems to be very little marketing let alone technical information being made available. All I can figure is that it performs user mode emulation, similar to what QEmu can do, and does not cover some of the newer instructions.
Off topic, my job has been in virtualisation for the last 12 years, thus I am very familiar with the publicly available body of research on this topic. Ahead of time binary translation has been a niche area at best.
My understanding from WWDC is that it does AOT compilation at install time when possible. If an app marks a page executable at runtime (such as a JIT) it will interpret that.
Newer instructions are still encumbered by patents.
What these instructions do so that someone working on their own couldn't figure out? I am sick of patents lodged just because particular engineers got there first. When you finally save money to do your own research often you learn that your idea has already been patented. It just made work more expensive as I had to spend time finding way around the patent. Such monopoly on ideas should not be legal.
> patents lodged just because particular engineers got there first
That is what patents are. They encourage you to actually get there by giving you a time-limited monopoly on the implementation. It is easy to say "a freshman CS student could have figured this out", but if they did, they could have had the patent instead and licensed it to Intel and Apple. Instead, Intel and Apple had to figure it out on their own.
That is what patents attempt to be. There are many reasons that make them less effective at this than their ideal.
In particular, costs are high, litigation to protect is expensive, and so your average student wouldn't be able to afford this. The fact that software is typically shipped virtually means that borders are practically non-existent, and wide patents are often needed, or a company needs to give up on defending their patent outside their primary market.
To patent an idea will probably be around $10-100k per market. To cover US, EU, and large Asian markets, you're looking at $500k-1m, and thats just to get the patent. Then you'll need to defend it, which can be hard to do against entities based in non-compliant countries such as China.
This all means that unless you're defending the very core of your entire business proposition, you probably need to be a >$100m before it's worth pursuing patents, and even for the core of your business you probably need to have several million in funding.
I don't think a freshman CS student could afford to apply for a patent. This is only reserved for big companies or engineers backed up by wealth. You also often get things patented that nobody thought about patenting as they are that obvious. It's just a mechanism to gate keep and secure profits for big guys. I have a friend who has multiple patents and their investor only agreed to put money in their project if they patented it and shared all revenue. They were not actually interested in the product itself apart from likelihood the patents will bring money. The project is now dead, but patents stay blocking anyone from trying the same idea.
Actually, speaking from experience, you can do this yourself (or mostly yourself with some guidance from professionals). Nolo press has a book ("Patent it Yourself") on the topic.
"If an app marks a page executable at runtime (such as a JIT) it will interpret that."
Was that from a presentation on Rosetta2 that I have missed? It certainly makes sense, but also you'd need to watch for writes to executable pages that have been already AOT translated.
JITs usually mark +x pages -w again after they've written them for sanity, so watching in platform libc mprotect could do it, but you could also force them to be -w anyway and then maybe use the page fault handler. How much integration with the kernel does it have?
Javascript being the extreme example here (no one is going to run Chrome or Firefox in Rosetta2 probably) but it is re-JITing very often (I think Firefox starts with interpreting JS code, and then replaces hot code paths with JIT compiled equivalents when they become available).
I also see issues with self-modifying code, but this is also very rare (I knew some .NET code that was injecting/manipulating its own JIT code to go around C# private/protected encapsulation).
> no one is going to run Chrome or Firefox in Rosetta2 probably
Browsers themselves no, but there's many apps out there based on browser tech (e.g. electron apps). Also many apps shipping with their own JVM or other JIT engine.
Anecdote incoming, my pet project is based on electron, I am currently building a mac x86 version but don't plan to ship an ARM version (since testing it without an actual physical mac is going to be even more difficult / impossible).
Ah, the question is whether Darwin enforces W^X, because if so, the problem becomes very easy. Then you re-translate whenever a page becomes executable [again].
It seems to me that the correct term is "Static Binary Translation" (SBT) for what you call "ahead of time binary translation".
And the correct term for "JIT-based emulation" is "Dynamic Binary Translation" (DBT).
At least these are the terms you should use if you want to find some literature on this subject.
We're not talking about JIT or AOT compiler because it's not really a compilation (compilation is translating to a lower level language).
I think a lot of people talk about JIT rather than DBT because the JIT term is better known, and there is confusion when Apple says they do "Dynamic translation for JITs".
Which means that: they do DBT to handle applications that use JIT.
You are correct, static binary translation is what Rosetta does first. That, however, is what I called niche technology in another post, most of the research so far had focused on dynamic binary translation.
Furthermore, SBT, even for user mode binaries, can rarely reach the performance levels that we see with Rosetta2. There are many issues in determining what is code, where are the branch destinations in case of indirect branches, etc. What we have here is certainly a feat of engineering on its own.
> There are many issues in determining what is code, where are the branch destinations in case of indirect branches, etc.
Yes, handling indrect branch seems a bit complex and I'm not a specialist in the field.
But I'm pretty sure that the cases of indirect branch are rare enough so that an additional indirection is relatively inexpensive.
Adding a simple address mapping table should meet most of the cases.
An interesting question would also be whether Apple has added features to the hardware to improve the translation?
We know, for example, that Apple introduced a special register [1] to temporarily switch from the ARM consistency model to the TSO consistency model (Total Store Order) from x86.
That is marketing terminology (because "emulation is slow"). Full static transpiling is not a solvable problem - you can't actually take an x86 app, run it through some converter, and get an ARM app out. It's just not a thing and it never will be (without cheating and, like, literally embedding an emulator in the app).
Anything less than that is emulation, and requires dynamic elements. All modern emulators use JIT, and caching the result is similar to AoT translation; plus JIT can be faster than AoT sometimes due to being able to take advantage of runtime profiling, and you can never guarantee ~full AoT translation of even binaries without self-modifying code without additional metadata (like a list of all branch destinations), so Rosetta cannot possibly claim it does that with full coverage. On top of that you need to add a level of indirection to all indirect branches, as you cannot statically change all function pointers in data structures (that's an even harder problem). At that point you're adding enough bookkeeping gunk to the translated code that it is no longer a straight translation, like Apple would want you to believe. JIT is binary translation too, so by Apple marketing standards, qemu, Dolphin, and basically every other modern emulator is also "translation". Which is just not useful.
So everyone saying that "Rosetta 2 is AoT translation" as if that means it's fundamentally better/faster than other emulation technologies is just falling to marketing.
Whatever you call it, it's not fundamentally different from any other emulator in a way that puts it in another class of technology. It is not straight converting x86 to ARM. That's just not a thing and it never will be. The end result is that the CPU is going to be executing a series of translated basic blocks interspersed with code added by the translation to glue everything together, which is the same thing every JIT-based emulator does, and will have the same performance characteristics, and the fact that some of that work can be done ahead of time is not a fundamental difference.
If you want to look for reasons why Rosetta 2 is faster than other emulators, look for places where Apple cheated and made their CPUs implement x86 things like its memory consistency model. That can have massive gains. I bet if you port a decent JIT-based emulator to use that feature on M1, and compare it to Rosetta 2 for number crunching inner loops and such, you'll find you can get very similar performance numbers out of it once the JIT cache is warm.
It'll be interesting when people take a deep dive into specific things Rosetta 2 does.
Why would they market it? They will market the result--performance. Rosetta is just for geeks to appreciate, laypersons don't care about that stuff, all they want to know is whether the end result is faster or slower, as that's what affects them.
That doesn't match with what we know about Rosetta 2. Rosetta 2 can't run processes with mixed architectures, so ARM hosts can only run in-process ARM plug-ins, and x86 hosts can only run in-process x86 plug-ins. Apple's AUv3 plug-in architecture is out-of-process, so you can mix those, but there is no way you can mix ARM and x86 VSTs, for example, without specific work by hosts to provide an out-of-process shim translation layer.
Either he's talking about AUv3 specifically, or the hosts he tested already are doing out-of-process wrapping, or Rosetta 2 is actually magic (AFAICT this isn't a generally solvable problem at that layer), or he's confused.
Why aren't programs distributed by the Mac App Store pre-translated once and the translation downloaded to the Mac?
They'd still have to be run under rosetta2 (because programs can write code and branch to it) but a lot of computation could be done once rather than every time.
Not sure, but, my educated guess is that it will be more efficient as they improve Rosetta 2 to do this on demand instead of pre-translating every app on the store every time Rosetta 2 is updated.
Even if that means the users have to retranslate, that's still essentially "free" (to Apple) distributed compute.
It’s the other way around: an intermediate state of compilation is uploaded (basically the internal state after all non-machine-specific optimizations have been done, like hoisting invariants out of loops, etc). The App Store finishes compiling with the flags for the various CPUs supported (iPhone 6, iPhone 12, etc).
translation of most apps doesn't take a significant amount of time, and have it pretranslated would mean shipping code that was not signed by the app developer.
I’ll still be waiting to see what latency-sensitive performance is like (specifically audio plugins) but this is halfway to addressing my biggest concern about buying an M-series Mac while the software market is still finding its footing.