If this can be reproduced on the iPhone, it can lead to 3rd party keyboards exfiltrating data. By default, keyboard app extensions are sandboxed away from their owning applications , but they may communicate with the app over this channel and leak data. It's not as easy as I describe because the app would have to be alive and scheduled on the same cluster, but it's within the realm of possibility.
Here is the follow-up
> However, since iOS apps distributed through the App Store are not allowed to build code at runtime (JIT), Apple can automatically scan them at submission time and reliably detect any attempts to exploit this vulnerability using static analysis (which they already use). We do not have further information on whether Apple is planning to deploy these checks (or whether they have already done so), but they are aware of the potential issue and it would be reasonable to expect they will. It is even possible that the existing automated analysis already rejects any attempts to use system registers directly.
Obfuscated malware where the malicious part is not obvious; it's distributed and requires a separate process/image.
Curious to see if some smart Apple-ers can invent a fix for this, though it seems like "no way" given the vulnerability.
The argument was there the entire time. Some people just buried their heads in the sand though.
Firstly, no normal JITC will ever emit instructions that access undocumented system registers. Any JITC that comes from a known trusted source (and they're expensive to develop, so they basically all do) would be signed/whitelisted already and not be a threat anyway.
So what about new/unrecognised programs or dylibs that request JITC access? Well, Apple already insist on creating many categories of disallowed thing in the app store that can't be detected via static analysis. For example, they disallow changing the behaviour of the app after it is released via downloaded data files, which is both very vague and impossible to enforce statically. So it doesn't fundamentally change the nature of things.
But what if you insist on being able to specifically fix your own obscure CPU bugs via static analysis? Well, then XNU can just implement the following strategy:
1. If a dylib requests a JITC entitlement, and the Mach-O CD Hash is on a whitelist of "known legit" compilers, allow.
2. Otherwise, require pages to be W^X. So the JITC requests some writeable pages, fills them with code, and then requests the kernel to make the pages executable. At that point XNU suspends the process and scans the requested pages for illegal instruction sequences. The pages are hot in the cache anyway and the checks are simple, so it's no big deal. If the static checks pass, the page is flipped to be executable but not writeable and the app can proceed.
Apple's ban on JITC has never really made much technical sense to me. It feels like a way to save costs on program static analysis investment and to try and force developers to use Apple's own languages and toolchains, with security being used as a fig leaf. It doesn't make malware harder to write but it definitely exposes them to possible legal hot water as it means competitors can't build first-party competitive web browsers for the platform. The only thing that saves them is their own high prices and refusal to try and grab high enough market share.
*What about iOS?*
iOS is affected, like all other OSes. There are unique privacy implications to this vulnerability on iOS, as it could be used to bypass some of its stricter privacy protections. For example, keyboard apps are not allowed to access the internet, for privacy reasons. A malicious keyboard app could use this vulnerability to send text that the user types to another malicious app, which could then send it to the internet.
The real saving grace here is that iOS app binaries are submitted as LLVM IR instead of ARM machine code.
Uh, no? This is very tractable - O(N) in the size of the binary - just check, for every single byte offset in executable memory, whether that offset, if jumped to or continued to from the previous instruction, would decode into a `msr s3_5_c15_c10_1, reg` or `mrs reg, s3_5_c15_c10_1` instruction.
IIUC, the decoding of a M1 ARM instruction doesn't depend on anything other than the instruction pointer, so you only need one pass, and you only need to decode one instruction, since the following instruction will occur at a later byte address.
Edit: unless its executable section isn't read-only, in which case static analyzers can't prove much of anything with any real confidence.
For example, this benign line of code would trip a static analyzer looking for `msr s3_5_c15_c10_1, x15` in the way you described:
uint32_t x = 0xd51dfa2f;
There are 26 fixed bits in the problem instructions, which means a false positive rate of one in 256MiB of uniformly distributed constant data (the false positive rate is, of course, zero for executable code, which is the majority of the text section of a binary). Constant data is not uniformly distributed. So, in practice, I expect this to be a rather rare occurrence.
I just looked at some mac binaries, and it seems movk and constant section loads have largely superseded arm32 style inline constant pools. I still see some data in the text section, but it seems to mostly be offset tables before functions (not sure what it is, might have to do with stack unwinding), none of which seems like it could ever match the instruction encoding for that register. So in practice I don't think any of this will be a problem. It seems this was changed in gcc in 2015 , I assume LLVM does the same.
Data segments should go in read only memory with no write or execute permission.
There's often two kinds of loadable data pages: initialized constants (RO), initialized variables (RW), so some will need to be writable because pesky globals will never seem to die. Neither of should ever have execute or that will cross the streams and end the universe. I'm annoyed when constants or constant pools are loaded into RW data pages because it doesn't make sense.
So, it's basically an honor system. You cannot detect JIT, because there aren't "certain instructions" that aren't allowed - it's just certain registers that programs shouldn't access (but access patterns can be changed in branching code to ensure Apple won't catch it in their sandboxes).
Besides, even if certain instructions are not allowed, a program can modify itself, it's hard to detect if a program modifies itself without executing the program under specific conditions, or running the program in a hypervisor.
Besides, even if certain instructions are not allowed, a program can modify itself, it's hard to detect if a program modifies itself without executing the program under specific conditions.
Edit: fixed name of the chip.
Furthermore, the keyboard app extension and the keyboard app are installed as a single package whose components are not supposed to communicate, hence why I brought this up.
> Then perhaps you should stop reading that news site, just like they stopped reading this site after the first 2 paragraphs.
Marcan is a genius, in every aspect. He is on my top list of people I could read all day long without getting annoyed.
Pretty much everything he posts on Twitter is interesting and curious. I'm a huge fan!
The other person I have similar feelings for is Geohot.
These guys are really, really smart.
I don't know about that... https://news.ycombinator.com/item?id=25679907
So then this has to be fake then, obviously. Apparently George Hotz (geohot/tomcr00se) won a few CTFs single handedly .
I'm sure that marcan is also genius as well, unfortunately though Hotz is somehow still able to stay relevant, continuously.
The only difference between the twos is that Geohot does a lot of thing for the fame (or at least it seems so), and marcan does that only for fun.
I'm okay with both tbh, if you are at this level you deserve some fame
> Wait. Oh no. Some game developer somewhere is going to try to use this as a synchronization primitive, aren't they. Please don't. The world has enough cursed code already. Don't do it. Stop it. Noooooooooooooooo
Its ok George, we love you and you know it
I tried, but I also talked about it on public IRC before I knew it was a bug and not a feature, so I couldn't do much about that part. ¯\_(ツ)_/¯
This whole site is a good read. A great mix of real information, jokes, and a good send-up of how some security releases appear these days (I understand to a degree the incentives that cause those sites to be as they are, and I don't think they area all bad, but it's still good and useful to poke fun them I think).
This is Mark Kettenis, who has despite comments made jokingly by marcan, been working with a few other OpenBSD developers to bring-up OpenBSD/arm64 on the Apple M1. At least on the Mac Mini the Gigabit Ethernet works, Broadcom Wi-Fi, and work on the internal NVMe storage is progressing.
There was an early teaser dmesg posted in Feburary showing OpenBSD booting multi-user (on bare metal): https://marc.info/?l=openbsd-arm&m=161386122115249&w=2
Mark has also been adding support for the M1 to the U-Boot project, which will not only benefit OpenBSD, but also Asahi Linux.
Another OpenBSD developer posted these screenshots and videos on Twitter.
And I further expect that they’re already sampling the M chips for the subsequent round of products. Heck, they may even be completely done as well.
>Then perhaps you should stop reading that news site, just like they stopped reading this site after the first 2 paragraphs.
This is my most favorite
Am I missing something or is it somewhat likely this will be "abused" by games?
This was really just a good joke touching how the game industry in the past used non-common hardware features for optimization purposes.
Games usually live in the realm of latency.
I was not expecting such an entertaining FAQ. Good job, very informative, very amusing!
Plus, that's only the CPU side of things. The M1's GPU is annihilated by most GPUs in it's class... from 2014. Fast forwards to 2021, and it's graphics performance is honestly pathetic. Remember our friend the 4800u? It's integrated GPU is able to beat the M1's GPU in raw benchmarks, and it came out 18 months before it.
So yeah, I think there are a lot of workloads where the M1 is a pretty crappy CPU. Unless your workload is CPU-bound, there's not really much of a reason to own one. And even still, the M1 doesn't guarantee compatibility with legacy software. It doesn't have a functional hypervisor, and it has lower IO bandwidth than most CPUs from a decade ago. Not really something I'd consider viable as a "daily driver", at least for my workload.
"AMD's Ryzen 7 4800u hit 4ghz over 8 cores" - It doesn't. AMD specifies it as having 1.8 GHz base clock, 4.2 GHz max boost clock. AMD's cores use ~15W each at max frequency. Since the 4800U's configurable TDP range is 10W to 25W for the whole chip, there is no way that all 8 cores run at 4.2 GHz simultaneously for any substantial period of time. In fact, running even one core in its max performance state probably isn't sustainable in a lot of systems which opt to use the 4800U's default 15W TDP configuration.
On the other side of things, Apple M1 performance cores use ~6W each at max frequency. It is actually possible for all four to run at full performance indefinitely with the whole chip using about 25W, provided there is little GPU load.
"Remember our friend the 4800u? It's integrated GPU is able to beat the M1's GPU in raw benchmarks, and it came out 18 months before it." - Say what? The only direct comparison I've been able to find is 4700U vs M1, in Anandtech's M1 article, and it shows the M1 GPU as 2.6x faster in GFXBench 5.0 Aztec Ruins 1080p offscreen and 2.5x faster in 1440p high.
Granted, the 4700U GPU is a bit slower than the 4800U GPU, but not by a factor of 2 or more.
This isn't an unexpected result given that M1's GPU offers ~2.6 single precision TFLOPs while the 4800's is ~1.8 TFLOPs.
Literally everything you wrote about M1 being bad is wrongheaded in the extreme, LOL.
But you heard it here first guys, building CPUs is a chumps game. And you see no reason to celebrate the first genuinely viable, power-efficient and fast non x86 CPU being a mass success. Fine I guess, but I don't agree.
Also not sure why you wave away CPU bound workloads as though they don't exist or somehow lesser.
What does it make it then? Some unicorn device that I'm unworthy of? Is there something wrong with my workload, or Apple's? Apple is marketing the M1 to computer users. I'm a computer user, and I cannot use it as part of my workflow, I have every right to voice that concern to Apple.
> And you see no reason to celebrate the first genuinely viable, power-efficient and fast non x86 CPU being a mass success.
You must be late to the party, ARM has been around for years. Apple's power efficiency is about on-par with what should be expected from a 5nm ARM chip with a gimped GPU. What is there to celebrate, that Apple had the initiative to buy out the entirety of the 5nm node at TSCM, plunging the entire world into a semiconductor shortage unlike anything ever seen before? Yeah, great job Apple. I think it was worth disrupting the global economy so you could ship your supercharged Raspberry Pi /s
> Also not sure why you wave away CPU bound workloads as though they don't exist or somehow lesser.
CPU-bound workloads absolutely exist, but who's running them on a Mac? Hell, more importantly, who's running them on ARM? x86 still has a better value proposition than ARM in the datacenter/server market, and most local workloads are hardware-accelerated these days. I really don't know what to tell you.
Who's running them on ARM? Not many now, but everything starts somewhere.
It's called progress. You say it's 'to be expected' - well no one else has done it, have they?
not every register!
Also, I am still not sure if this is a disclosure, performance art, or extremely dry comedy, but it certainly covered all the bases.
The Newton wasn't really Apple Silicon:
The OMP/MP100/MP110/MP120/MP130 ran an ARM610.
The eMate300 ran an ARM710.
The MP2000/MP2100 ran a DEC StrongARM SA-110 CPU.
None of which were designed or manufactured by Apple.
ARM, the company only existed because Apple wanted them to manufacture a CPU for it's Newton project.
While Apple might not have designed the ARM610, but they technically owned it.
On 27 Nov 1990, ARM was formed with Apple owning 43% alongside Acorn (the designer), and VLSI Technology (the manufacturer).
Funny thing: I've found two articles that claim two different purchase prices for that 43%: one $3M  and the other $1.5B . That's quite a difference!
Nope, Apple never owned 50% of ARM.
> ARM, the company only existed because Apple wanted them to manufacture a CPU for it's Newton project.
Who knows what would have happened had Apple not invested but Apple was never ARM's only customer.
> While Apple might not have designed the ARM610, but they technically owned it.
If I own some Apple shares reasonably sure that doesn't mean that "technically" I own the M1.
I would assume a huge JITed VM implementation would show up easily in analysis.
>Can malware use this vulnerability to take over my computer? No.
>Can malware use this vulnerability to steal my private information? No.
> Poking fun at how ridiculous infosec clickbait vulnerability reporting has become lately. Just because it has a flashy website or it makes the news doesn't mean you need to care.
> If you've read all the way to here, congratulations! You're one of the rare people who doesn't just retweet based on the page title :-)
I'd argue this is not the case. What mainstream operating systems have made credible attempts to eliminate covert channels from eg timing or resources that can be made visible by cooperating processes across user account boundaries?
Without this vulnerability, there would still be a million ways to send data between cooperative processes running as different users on Mac OS X.
For example, a process could start subprocesses at a deterministic rate and the other end of the covert link observes how fast the pid counter is going up.
This is a non-vulnerability, because it targets something there was no effort to protect.
I mean not that anyone has a native Facebook or Instagram app on their device, but just to name an example.
The M1 is used in the iPad Pro so your example is definitely possible. (or your comment was sarcasm in which case: woosh to myself)
All of them.
A piece of software able to read my mail but not use the Internet could credibly be a tool to help me index and find my email using search keywords. It promises to not use the Internet, and indeed nm/objdump shows no use of networking tools.
Another piece of software able to monitor RSS feeds I am interested in and alert me to their changes is expected to use the Internet, but not the filesystem, and surely not the part of the filesystem that contains my email. I can use strace/dtruss to verify it never touches the filesystem, and use chroot/jail to keep it honest.
This being said, I agree that "mainstream operating systems" (meaning Windows and macOS, but not perhaps iOS) don't do enough and it might be impossible for them without changing user expectations, but I think they're trying. Web browsers disabled high resolution timers specifically to protect against this sort of thing. iOS doesn't permit arbitrary background tasks from running to protect battery and ostensibly privacy. But they could all do better.
: For example, for me high CPU load is a red flag - a program that does this to me regularly gets put into a VM so that I can mess with its time-- Zoom now loses about a minute every three if it's not focused which is annoying because it messes with the calendar view, but I'm pretty sure it can't do anything else I don't want it to. Who should do this work? My operating system? Zoom? Neither will do it if users don't demand it.
Paged shared libraries, signalling by ramping up and down CPU usage, there are an enormous number of possible covert channels.
The answer will depend on whether you consider Multi-Level Security (MLS) https://en.wikipedia.org/wiki/Multilevel_security "mainstream". It's certainly a well-established approach if only in an academic sense, and the conflux of new use cases (such as secretive, proprietary "apps" being expected to manage sensitive user data) and increasingly-hard-to-mitigate info disclosure vulnerabilities has made it more relevant than ever.
Are the chip registers not protected? What's the mechanism that's allowing this data sharing to happen?
The could also just both ping a server to exchange data.
For the OS to have a say, the CPU would need to provide a way where the OS tells it (usually by setting certain values in other registers) that the CPU should not allow access, at least under certain circumstances.
The article actually does go into certain situations where the access is more restricted (search for "VHE"), but also in how that does not really apply here.
> originally I thought the register was per-core. If it were, then you could just wipe it on context switches. But since it's per-cluster, sadly, we're kind of screwed, since you can do cross-core communication without going into the kernel.
Somewhat critically, it will also drop down to EL0.
Which macOS's kernel doesn't.
Plus, as I said above, this is prone to false positives anyway because the executable section on ARM also includes constant pools.
> Because pthread_jit_write_protect_np changes only the current thread’s permissions, avoid accessing the same memory region from multiple threads. Giving multiple threads access to the same memory region opens up a potential attack vector, in which one thread has write access and another has executable access to the same region.
However, this would be prone to false positives, as constant pools are in the executable section on ARM.
Apple can't just scan for a bad byte sequence in executable pages because it could also represent legitimate constants used by the program. (not sure if this part is correct?)
If so, doesn't that make detection via static analysis infeasible unless LLVM is patched to avoid writing bad byte sequences in constant pools? Otherwise they have to risk rejecting some small number of non-malicious binaries, which might be OK, depending on the likelihood of it happening.
With certain restrictions, it is possible to do this: Google Native Client  has a verifier which checks that programs it executed did not jump into the middle of other instructions, forbade run-time code generation inside of such programs, etc.
I don't think Rice's Theorem applies here. As a counterexample: On a hypothetical CPU where all instructions have fixed width (e.g. 32 bits), if accessing a register requires the instruction to have, say, the 10th bit set, and all other instructions don't, and if there is no way to generate new instructions (e.g. the CPU only allows execution from ROM), then it is trivial to check whether there is any instruction in ROM that has bit 10 set.
The next part I'm less sure how to state it rigorously (I'm not in the field): In our hypothetical CPU, I think disallowing that instruction either lets you remain being Turing Complete or not. In the former case, it's still the case that you can compute everything a Turing Machine can.
But the M1 does have a way to "generate new instructions" (i.e., JIT), so that counterexample doesn't hold for it.
But I wanted to show how Rice's Theorem does not generally apply here. You can make up other examples: A register that needs an instruction with a length of 1000 bytes, yet the ROM only has 512 bytes space etc...
As for JIT, also correct (hence my condition), though that's also a property of the OS and not just the M1 (and on iOS for example, it is far more restricted what code is allowed to do JIT, as was stated in the thread already).
1) the program does not contain an instruction that touches s3_5_c15_c10_1
2) the program contains an instruction that touches s3_5_c15_c10_1, but never executes that instruction
3) the program contains an instruction that touches s3_5_c15_c10_1, and uses it
Rice's theorem means we cannot tell whether a program will touch the register at runtime (as that's a dynamic property of the program). But that's because we cannot tell case 2 from case 3. It's perfectly decidable whether a program is in case 1 (as that's a static property of the program).
Any sound static analysis must have false positives -- but those are exactly the programs in case 2. It doesn't mean we end up blocking other kinds of instructions.
If you don’t have misbehaving programs on your computer that want to secretly communicate than it doesn’t matter.
> If you already have malware on your computer, that malware can communicate with other malware on your computer in an unexpected way.
> Chances are it could communicate in plenty of expected ways anyway.
> Yeah, but originally I thought the register was per-core. If it were, then you could just wipe it on context switches. But since it's per-cluster, sadly, we're kind of screwed, since you can do cross-core communication without going into the kernel. Other than running in EL1/0 with TGE=0 (i.e. inside a VM guest), there's no known way to block it.
In other words: this register is shared between cores, so if the two processes are running simultaneously on different cores, they can communicate by reading & writing directly to & from this register, without any operating system interaction.
> Apple decided to break the ARM spec by removing a mandatory feature
Is there a page documenting all incompatibilities / violations of the ARM architecture specification by the M1?
After all, process isolation between cooperating processes is nearly impossible to do. If Apple close this loophole, there will be other lower bandwidth side channels like spinning up the fan in Morse code and the other process notices the clock speed scaling up and down...
There are hundreds of Apple implementation-defined registers; we're documenting them as we learn more about them   
Ok I nearly fell out of my chair. A+
In all seriousness, I wonder what the actual issue is.
Could anyone comment as to the implications of only supporting a Type 2 hypervisor that is (as said on the site) "in violation of the ARMv8 specification"?
It's just a very unfortunate coincidence that precisely that support would allow this bug to be trivially mitigated on Linux. (Wouldn't help macOS, as they'd have to implement this from scratch anyway; it's just that existing OSes that support this mode could use it).
The actual issue is just what I described: the hardware implementation of this register neglects to check for and reject accesses from EL0 (userspace). It's a chip logic design flaw. I don't know exactly where it is (whether in the core/instruction decoder, or in the cluster component that actually holds the register, depending on where they do access controls), but either way that's what the problem is.
This one is a minor side note but there could be other vulnerabilities that could be resolved if the specifications were followed (I assume).
So it's not that not following the spec prevents the workaround, it's just that had they followed the spec it would just take a single kernel command line argument (to force non-VHE mode) to fix this in Linux, while instead, now we'd have to make major changes to KVM to make the non-VHE code actually work with VHE, and really nobody wants to do that just to mitigate this silly thing.
Had this been a more dangerous flaw (e.g. with DoS or worse consequences), OSes would be scrambling to make major reworks right now to mitigate it in that way. macOS would have to turn its entire hypervisor design on its head. Possible, but not fun.
Some documentation would be nice...
On the Linux side, would qemu user-mode emulation work for that (maybe with a patch to take advantage of the M1's switchable-memory-order thing)?
If nothing else though, I plan to expose at least the TSO feature of the M1 so qemu can reduce the overhead of its memory accesses.
This vulnerability enables different Apps to communicate a super cookie for cross-app tracking. A possible exploit would be to implement this feature in an AD SDK to be used by different Apps.
Without such a silicon vulnerability the malicious process would need all its components within a single process/image?
> • OpenBSD users: Hi Mark!
Yes, "Hi Mark", whoever you are and no matter that I'm not an OpenBSD user.
If you are the author, thanks for the read.
Twitter handle matches.
If you do end up watching and liking it, there is book and a film about the filming of the film to enjoy :D https://en.wikipedia.org/wiki/The_Disaster_Artist