Hacker News new | past | comments | ask | show | jobs | submit login
M1racles: An Apple M1 covert channel vulnerability (m1racles.com)
1067 points by paulgerhardt on May 26, 2021 | hide | past | favorite | 277 comments



While Marcan has written in a very entertaining fashion, there is perhaps one application of this vulnerability that wasn't considered.

If this can be reproduced on the iPhone, it can lead to 3rd party keyboards exfiltrating data. By default, keyboard app extensions are sandboxed away from their owning applications [0], but they may communicate with the app over this channel and leak data. It's not as easy as I describe because the app would have to be alive and scheduled on the same cluster, but it's within the realm of possibility.

[0]: https://developer.apple.com/library/archive/documentation/Ge...


This exact use case is touched on in the article.

Here is the follow-up

> However, since iOS apps distributed through the App Store are not allowed to build code at runtime (JIT), Apple can automatically scan them at submission time and reliably detect any attempts to exploit this vulnerability using static analysis (which they already use). We do not have further information on whether Apple is planning to deploy these checks (or whether they have already done so), but they are aware of the potential issue and it would be reasonable to expect they will. It is even possible that the existing automated analysis already rejects any attempts to use system registers directly.


Full disclosure: I added this after the parent comment (and others) mentioned this case. :)


Thanks - yeah that is a real flaw.

Obfuscated malware where the malicious part is not obvious; it's distributed and requires a separate process/image.

Curious to see if some smart Apple-ers can invent a fix for this, though it seems like "no way" given the vulnerability.


As I mentioned below and on the disclosure page, it's trivial for Apple to reliably detect this in apps submitted to the App Store and reject them, so I'm not worried. There's no such thing as "obfuscated" malware in the traditional sense on the App Store. You can obfuscate the code flow all you want, but all executable code has to be signed to run on iDevices. If you try to use this register, the instruction will be there for all to see. You can't use self-modifying code or packers on iOS.


I expect Apple to include checks for this in their App Store static analyzer, if they aren't already rejecting sysreg instructions, which mitigates the issue. Obviously JIT isn't allowed in the App Store, so this should be an effective strategy.


How convenient for Apple. Now they finally have a good argument to keep forbidding JIT compilation and side-loading.


> Now they finally have a good argument to keep forbidding JIT compilation and side-loading.

The argument was there the entire time. Some people just buried their heads in the sand though.


JITC is irrelevant actually. This is not an argument for blocking it.

Firstly, no normal JITC will ever emit instructions that access undocumented system registers. Any JITC that comes from a known trusted source (and they're expensive to develop, so they basically all do) would be signed/whitelisted already and not be a threat anyway.

So what about new/unrecognised programs or dylibs that request JITC access? Well, Apple already insist on creating many categories of disallowed thing in the app store that can't be detected via static analysis. For example, they disallow changing the behaviour of the app after it is released via downloaded data files, which is both very vague and impossible to enforce statically. So it doesn't fundamentally change the nature of things.

But what if you insist on being able to specifically fix your own obscure CPU bugs via static analysis? Well, then XNU can just implement the following strategy:

1. If a dylib requests a JITC entitlement, and the Mach-O CD Hash is on a whitelist of "known legit" compilers, allow.

2. Otherwise, require pages to be W^X. So the JITC requests some writeable pages, fills them with code, and then requests the kernel to make the pages executable. At that point XNU suspends the process and scans the requested pages for illegal instruction sequences. The pages are hot in the cache anyway and the checks are simple, so it's no big deal. If the static checks pass, the page is flipped to be executable but not writeable and the app can proceed.

Apple's ban on JITC has never really made much technical sense to me. It feels like a way to save costs on program static analysis investment and to try and force developers to use Apple's own languages and toolchains, with security being used as a fig leaf. It doesn't make malware harder to write but it definitely exposes them to possible legal hot water as it means competitors can't build first-party competitive web browsers for the platform. The only thing that saves them is their own high prices and refusal to try and grab high enough market share.


Possibly, the article has been updated in the last couple of hours, but it now says:

*What about iOS?*

iOS is affected, like all other OSes. There are unique privacy implications to this vulnerability on iOS, as it could be used to bypass some of its stricter privacy protections. For example, keyboard apps are not allowed to access the internet, for privacy reasons. A malicious keyboard app could use this vulnerability to send text that the user types to another malicious app, which could then send it to the internet.


There would be code signatures that can detect this use by apple?


Detection is very hard if the developer employs very clever obfuscation. See: halting problem.


Only if detection requires solving the halting problem. It does not. You just look for certain instructions that normal code shouldn't use. JIT isn't allowed (which means all instructions the program uses can be checked statically), so it should be easy enough.


Marcan said elsewhere in the thread that the executable section on ARM also includes constant pools, so if I understand correctly, you can hide instructions in there and make it intractable for a static analyzer to determine whether they are really instructions or just data.

The real saving grace here is that iOS app binaries are submitted as LLVM IR instead of ARM machine code.


> you can hide instructions in there and make it intractable for a static analyzer to determine whether they are really instructions or just data.

Uh, no? This is very tractable - O(N) in the size of the binary - just check, for every single byte offset in executable memory, whether that offset, if jumped to or continued to from the previous instruction, would decode into a `msr s3_5_c15_c10_1, reg` or `mrs reg, s3_5_c15_c10_1` instruction.

IIUC, the decoding of a M1 ARM instruction doesn't depend on anything other than the instruction pointer, so you only need one pass, and you only need to decode one instruction, since the following instruction will occur at a later byte address.

Edit: unless its executable section isn't read-only, in which case static analyzers can't prove much of anything with any real confidence.


Yes but if program constants are in executable memory, then you can end up with byte sequences that represent numeric values but also happen to decode into the problematic instructions.

For example, this benign line of code would trip a static analyzer looking for `msr s3_5_c15_c10_1, x15` in the way you described:

  uint32_t x = 0xd51dfa2f;


I said false positives are an issue in the context of a "dumb" real-time kernel-side scan. App Store submission is different. They can afford to have false positives and have a human look at them to see if they look suspicious.

There are 26 fixed bits in the problem instructions, which means a false positive rate of one in 256MiB of uniformly distributed constant data (the false positive rate is, of course, zero for executable code, which is the majority of the text section of a binary). Constant data is not uniformly distributed. So, in practice, I expect this to be a rather rare occurrence.

I just looked at some mac binaries, and it seems movk and constant section loads have largely superseded arm32 style inline constant pools. I still see some data in the text section, but it seems to mostly be offset tables before functions (not sure what it is, might have to do with stack unwinding), none of which seems like it could ever match the instruction encoding for that register. So in practice I don't think any of this will be a problem. It seems this was changed in gcc in 2015 [0], I assume LLVM does the same.

[0] https://gcc.gnu.org/pipermail/gcc-patches/2015-November/4334...


That makes sense. I'm glad to be wrong :-)


Only on watchOS is Bitcode required (to support the watch's 32-bit to 64-bit transition), on all other platforms it's optional and often turned off, as it makes a variety of things harder, like generating dSYMs for crash reporting.


Oh. Then I don't see how this can be reliably mitigated, other than patching LLVM to avoid writing the `msr s3_5_c15_c10_1` byte sequence in constant pools and then rejecting any binary that contains the byte sequence in an executable section. That seems difficult to get done before someone is able to submit a PoC malicious keyboard to the store, potentially turning this "joke" bug into a real problem. What am I missing?


That's problematic. Allowing the constant pools in executable memory is a bad idea.

Data segments should go in read only memory with no write or execute permission.


WOX, except transmuting user code pages to data pages (reading its own code should be fine since it was loaded from a user binary anyhow) or a supervisor-level JIT helper to check and transmute user data pages into user code pages (check that user-mode JITs aren't being naughty).

There's often two kinds of loadable data pages: initialized constants (RO), initialized variables (RW), so some will need to be writable because pesky globals will never seem to die. Neither of should ever have execute or that will cross the streams and end the universe. I'm annoyed when constants or constant pools are loaded into RW data pages because it doesn't make sense.


Does the IR help if you're obfuscating instructions as static data?


> JIT isn't allowed

So, it's basically an honor system. You cannot detect JIT, because there aren't "certain instructions" that aren't allowed - it's just certain registers that programs shouldn't access (but access patterns can be changed in branching code to ensure Apple won't catch it in their sandboxes).

Besides, even if certain instructions are not allowed, a program can modify itself, it's hard to detect if a program modifies itself without executing the program under specific conditions, or running the program in a hypervisor.


> JIT isn't allowed

So, it's basically an honor system. You cannot detect JIT, because there aren't "certain instructions" that aren't allowed - it's just certain registers that programs shouldn't access (but access patterns can be changed in branching code to ensure Apple won't catch it in their sandboxes).

Besides, even if certain instructions are not allowed, a program can modify itself, it's hard to detect if a program modifies itself without executing the program under specific conditions.


You're missing the point, JIT not allowed means programs may not modify themselves. They're in read+execute only memory and cannot allocate writable+executable memory.


IPhones use A12/13/14 chip and the vulnerability is not confirmed there. Also, the post mentions that if you have two malware apps on your device, they can communicate in many other ways, so I'm not sure what's new here.

Edit: fixed name of the chip.


I just tested it on the A14 and it seemed to work there.


I wonder if it would have passed Apple's review process?


At this point I would hope that App Store ingestion would filter for this.


iPhones do not use the A1 chip as of quite a few years ago. Besides, the M1 and the A12+ have significant microarchitectural similarities, to the point that the DTK used the A12Z.

Furthermore, the keyboard app extension and the keyboard app are installed as a single package whose components are not supposed to communicate, hence why I brought this up.


I believe that only significant difference between A14 and M1 (apart from package) is number of cores.


The only 1 in the name of the chip is typo. The rest I'm still not sure if it is significant.


iPad contains an m1 chip so that might be a similar better example.


> I came here from a news site and they didn't tell me any of this at all!

>

> Then perhaps you should stop reading that news site, just like they stopped reading this site after the first 2 paragraphs.

Marcan is a genius, in every aspect. He is on my top list of people I could read all day long without getting annoyed.

Pretty much everything he posts on Twitter is interesting and curious. I'm a huge fan!

The other person I have similar feelings for is Geohot.

These guys are really, really smart.


> The other person I have similar feelings for is Geohot.

I don't know about that... https://news.ycombinator.com/item?id=25679907


> But it is no surprise that George Hotz, working alone as team tomcr00se, would rise to the top of the CTF.

So then this has to be fake then, obviously. Apparently George Hotz (geohot/tomcr00se) won a few CTFs single handedly [0][1].

I'm sure that marcan is also genius as well, unfortunately though Hotz is somehow still able to stay relevant, continuously.

[0] https://www.koscom.co.kr/eng/bbs/B0000043/view.do?nttId=1040...

[1] https://www.prnewswire.com/news-releases/nyu-poly-cyber-secu...


You gave denysvitali some serious cognitive dissonance.


Indeed. Lol

The only difference between the twos is that Geohot does a lot of thing for the fame (or at least it seems so), and marcan does that only for fun.

I'm okay with both tbh, if you are at this level you deserve some fame


Hearing an S-Tier hacker call a fellow S-Tier hacker B-Tier is certainly entertaining, but from my lowly perspective they're still far more capable than 99% of devs I'll ever encounter.


Yes, he's a great self promoter and a genius level engineer. You can watch his livestreams to see both for yourself.


I also really liked this line

> Wait. Oh no. Some game developer somewhere is going to try to use this as a synchronization primitive, aren't they. Please don't. The world has enough cursed code already. Don't do it. Stop it. Noooooooooooooooo



>The other person I have similar feelings for is Geohot. These guys are really, really smart.

Its ok George, we love you and you know it


Ah! I'm not sure he would really like the comparison with geohot.


Was this responsibly disclosed?

I tried, but I also talked about it on public IRC before I knew it was a bug and not a feature, so I couldn't do much about that part. ¯\_(ツ)_/¯

This whole site is a good read. A great mix of real information, jokes, and a good send-up of how some security releases appear these days (I understand to a degree the incentives that cause those sites to be as they are, and I don't think they area all bad, but it's still good and useful to poke fun them I think).


> "OpenBSD users: Hi Mark!"

This is Mark Kettenis, who has despite comments made jokingly by marcan, been working with a few other OpenBSD developers to bring-up OpenBSD/arm64 on the Apple M1. At least on the Mac Mini the Gigabit Ethernet works, Broadcom Wi-Fi, and work on the internal NVMe storage is progressing.

There was an early teaser dmesg posted in Feburary showing OpenBSD booting multi-user (on bare metal): https://marc.info/?l=openbsd-arm&m=161386122115249&w=2

Mark has also been adding support for the M1 to the U-Boot project, which will not only benefit OpenBSD, but also Asahi Linux.

Another OpenBSD developer posted these screenshots and videos on Twitter.

https://twitter.com/bluerise/status/1359644736483655683

https://twitter.com/bluerise/status/1354216838406823936


I'm almost as impressed that m1racles.com was available as I am with people who are good enough at this kind of reverse engineering that they can do it for fun.


Quick register all words that start with MI for future vulnerabilities. I’m waiting for M1RAGE and M1TIGATE myself.


You joke but somewhere there's a domainer doing this right now.


I'll give them a couple too: M1GHT, M1CRO*, M1ASMA (M1ASTHMA?), M1D*, M1FFED (with some 0xFFED somewhere?), M1GRATE (for some particularly pesky data extraction hack?), M1LES (for some unit conversion bug that makes the first MacOs-based spaceship crash)


On a hunch I tried "myasthma.com" to see if that spelling were free. TIL that redirects to GlaxoSmithKline's en-gb homepage!


M1GRAIN for something about atomic ops.


M1LF


That would be a line feed related parsing vulnerability in a messaging app on M1.


M1XOMATOSIS, M1G-29, M1XMASTRM1KE, M1CKEYM1CE, M1L1TTLEPWN1E…


... for those who like Apple a bit too much.


m1stake, m1aculpa


Is m1aculpa like the mar1o spelling of the word? MamaM1a!


M1SASMA - for a particularly stinky bug.


A "domainer" — that must be one of the ultimate pejorative descriptions of a person.


Is it though? They refer to themselves as domainers quite often. (I work with them indirectly)


It'll have a short shelf life. Won't the M2 be out within a year?


Soon enough we'll get the M9 and then the MX because of reasons. And SE / Lite / Pro versions.


I would suspect the final versions of M chips for all the first round Apple Silicon Macs are all taped out with no respins planned.

And I further expect that they’re already sampling the M chips for the subsequent round of products. Heck, they may even be completely done as well.


MeToo


I'm constantly surprised what domains are still available. I've registered many 2/3-letter domains (with 3-4 letter TLDs) in the past year, as well as ones for very common nouns (some also 3 letters), almost always for under $40. Admittedly it's mostly for the newer TLDs, though.


Similar story. I own a half-dozen relatively recently-registered three-letter domains at two-letter ccTLDs. I’m surprised every time one turns out to be available at normal rates.


>I came here from a news site and they didn't tell me any of this at all!

>Then perhaps you should stop reading that news site, just like they stopped reading this site after the first 2 paragraphs.

This is my most favorite


> Wait. Oh no. Some game developer somewhere is going to try to use this as a synchronization primitive, aren't they. Please don't. The world has enough cursed code already. Don't do it. Stop it. Noooooooooooooooo


Cross-core communication without going through the kernel seems like a very useful performance feature for games.

Am I missing something or is it somewhat likely this will be "abused" by games?


You can already communicate between apps without going through the kernel by using shared memory - with a much higher bandwidth. And even just the regular write/sendmsg/etc calls are probably more efficient despite going through the kernel due to being able to carry much more bytes.

This was really just a good joke touching how the game industry in the past used non-common hardware features for optimization purposes.


This thing communicates at 1MB/s. A “performance feature” it ain’t.


throughput and latency are different measures.

Games usually live in the realm of latency.


That begs the question, what is the latency, and what kind of feature would you anticipate this could be useful for?


Synchronization primitives AFAIK don't need to transfer huge amounts of data in a short time. One bit for every "okay" signal would suffice. At the given speed you can perform 8 million syncs per second between two threads.


who cares, it's not like you're going to use it to send textures from one thread to another


Even if it were a performance saver (it isn't), it'd break when new silicon is released with this issue fixed.


I checked this out to find out just... information I guess? I don’t own an M1 but plan to get an ARM Mac when I can budget it. Good to be aware of the landscape.

I was not expecting such an entertaining FAQ. Good job, very informative, very amusing!


Why would you spend money on crappy and locked down hardware that can't be fixed. A computer that you don't own but basically rent. Get a Lenovo Thinkpad and join the light side, you'll be amazed!


Whatever your opinions on Apples policies and behavior it's just ignorant to call the M1 'crappy' when it absolutely annihilates any processor in its class and doesn't at all get embarrassed when compared to high end desktop CPUs.


CPUs are a chump's game, and it's no surprise that Apple, the company with sole access to next-generation silicon, was able to reach last-generation performance on a laptop chip. Nobody freaked out when AMD's Ryzen 7 4800u hit 4ghz over 8 cores, I don't see a reason why I should freak out now when Apple's doing it with 10 less watts.

Plus, that's only the CPU side of things. The M1's GPU is annihilated by most GPUs in it's class... from 2014. Fast forwards to 2021, and it's graphics performance is honestly pathetic. Remember our friend the 4800u? It's integrated GPU is able to beat the M1's GPU in raw benchmarks, and it came out 18 months before it.

So yeah, I think there are a lot of workloads where the M1 is a pretty crappy CPU. Unless your workload is CPU-bound, there's not really much of a reason to own one. And even still, the M1 doesn't guarantee compatibility with legacy software. It doesn't have a functional hypervisor, and it has lower IO bandwidth than most CPUs from a decade ago. Not really something I'd consider viable as a "daily driver", at least for my workload.


"CPUs are a chump's game" - what? High performance CPUs which nevertheless use very little power are extremely difficult to design.

"AMD's Ryzen 7 4800u hit 4ghz over 8 cores" - It doesn't. AMD specifies it as having 1.8 GHz base clock, 4.2 GHz max boost clock. AMD's cores use ~15W each at max frequency. Since the 4800U's configurable TDP range is 10W to 25W for the whole chip, there is no way that all 8 cores run at 4.2 GHz simultaneously for any substantial period of time. In fact, running even one core in its max performance state probably isn't sustainable in a lot of systems which opt to use the 4800U's default 15W TDP configuration.

On the other side of things, Apple M1 performance cores use ~6W each at max frequency. It is actually possible for all four to run at full performance indefinitely with the whole chip using about 25W, provided there is little GPU load.

"Remember our friend the 4800u? It's integrated GPU is able to beat the M1's GPU in raw benchmarks, and it came out 18 months before it." - Say what? The only direct comparison I've been able to find is 4700U vs M1, in Anandtech's M1 article, and it shows the M1 GPU as 2.6x faster in GFXBench 5.0 Aztec Ruins 1080p offscreen and 2.5x faster in 1440p high.

Granted, the 4700U GPU is a bit slower than the 4800U GPU, but not by a factor of 2 or more.

This isn't an unexpected result given that M1's GPU offers ~2.6 single precision TFLOPs while the 4800's is ~1.8 TFLOPs.

Literally everything you wrote about M1 being bad is wrongheaded in the extreme, LOL.


Not being viable as your daily driver does not make it crappy.

But you heard it here first guys, building CPUs is a chumps game. And you see no reason to celebrate the first genuinely viable, power-efficient and fast non x86 CPU being a mass success. Fine I guess, but I don't agree.

Also not sure why you wave away CPU bound workloads as though they don't exist or somehow lesser.


> Not being viable as your daily driver does not make it crappy.

What does it make it then? Some unicorn device that I'm unworthy of? Is there something wrong with my workload, or Apple's? Apple is marketing the M1 to computer users. I'm a computer user, and I cannot use it as part of my workflow, I have every right to voice that concern to Apple.

> And you see no reason to celebrate the first genuinely viable, power-efficient and fast non x86 CPU being a mass success.

You must be late to the party, ARM has been around for years. Apple's power efficiency is about on-par with what should be expected from a 5nm ARM chip with a gimped GPU. What is there to celebrate, that Apple had the initiative to buy out the entirety of the 5nm node at TSCM, plunging the entire world into a semiconductor shortage unlike anything ever seen before? Yeah, great job Apple. I think it was worth disrupting the global economy so you could ship your supercharged Raspberry Pi /s

> Also not sure why you wave away CPU bound workloads as though they don't exist or somehow lesser.

CPU-bound workloads absolutely exist, but who's running them on a Mac? Hell, more importantly, who's running them on ARM? x86 still has a better value proposition than ARM in the datacenter/server market, and most local workloads are hardware-accelerated these days. I really don't know what to tell you.


Audio production is almost entirely CPU bound, to give one example.

Who's running them on ARM? Not many now, but everything starts somewhere.

It's called progress. You say it's 'to be expected' - well no one else has done it, have they?


Yeah, after two failed Macbooks from 2016 because of their ssds I can just say stay away from apple hardware until they reverse course on storage devices.


Why would you let my computer preference affect you this much?


> locked down hardware

not every register!


Well, the performance is pretty good...


“Aaaaa look at me I am right you are wrong aaaa”


Lenovo is a Chinese company.


So?


I've been stumbling through writing a pile of secure software development lifecycle management and disclosure practices documentation all evening, and desperately needed a bit of levity. This post delivered. Thank you.

Also, I am still not sure if this is a disclosure, performance art, or extremely dry comedy, but it certainly covered all the bases.


> Newton OS users: I guess those are technically Apple Silicon but...

The Newton wasn't really Apple Silicon: The OMP/MP100/MP110/MP120/MP130 ran an ARM610. The eMate300 ran an ARM710. The MP2000/MP2100 ran a DEC StrongARM SA-110 CPU.

None of which were designed or manufactured by Apple.


At the time, Apple owned 50% of ARM.

ARM, the company only existed because Apple wanted them to manufacture a CPU for it's Newton project.

While Apple might not have designed the ARM610, but they technically owned it.


I did say "designed or manufactured" ... but I'll concede the point that they had some ownership of the 610/710, at least.

On 27 Nov 1990, ARM was formed with Apple owning 43% alongside Acorn (the designer), and VLSI Technology (the manufacturer).

Funny thing: I've found two articles that claim two different purchase prices for that 43%: one $3M [0] and the other $1.5B [1]. That's quite a difference!

[0] https://appleinsider.com/articles/20/06/09/how-arm-has-alrea...

[1] https://www.cultofmac.com/97055/this-is-how-arm-saved-apple-...


> At the time, Apple owned 50% of ARM.

Nope, Apple never owned 50% of ARM.

> ARM, the company only existed because Apple wanted them to manufacture a CPU for it's Newton project.

Who knows what would have happened had Apple not invested but Apple was never ARM's only customer.

> While Apple might not have designed the ARM610, but they technically owned it.

If I own some Apple shares reasonably sure that doesn't mean that "technically" I own the M1.


ARM was a joint venture between Acorn Computers, Apple Computer and VLSI Technology so it's not that clear cut.


This is the best thing I've seen on the internet for a long time. Hopefully some people (tech journalists and twitter folks) will "fall for it" and learn along the way...


I suppose you could use it to create a "covert suite" of apps for the M1 iPad that talk to each other where they aren't supposed to. Sharing permission X from app 1 with app 2 that isn't supposed to have permission X, etc.


Thankfully Apple can, in principle, statically analyze for this on the iOS App Store, as they do not allow JIT mappings on those devices.


Can they guarantee no JIT code via static analysis as well? Or could someone sneak in a tiny bit of disguised JIT code just to get to this register?

I would assume a huge JITed VM implementation would show up easily in analysis.


They don't provide anyway to mark memory as executable.


Well, they do, because they have to run your code :P You just can't make a new page of code and mark it executable.


The OS only makes pages executable if they come from a signed app. There is no way for the app itself to do that.


If it makes it more clear, my comment was mostly "if your code page has a valid signature you can mark it as executable".


Do you need JIT though? Does Xcode support inline ASM, or various compilers extensions that can read/write a cpu register?


If you put this in your app directly, Apple can just find it and reject it at submission time. If JIT were an option, that wouldn't be enough, because the app could do it at runtime. Since it isn't, there is no way to "hide" something like this from the App Store static analyzer.


Hrm. It seems like inline ASM allows for passing the register name dynamically, though I can't tell for sure. If that's the case, it seems like it would be hard to tell ahead of time, other than "app calls msr/mrs".


Inline assembly must resolve register names at compile time.


The attackers already have whatever data you are intending them to steal/share. The author says this bug is no big deal:

>Can malware use this vulnerability to take over my computer? No.

>Can malware use this vulnerability to steal my private information? No.


This is a bit different. Skirting app store policy/rules


> So what's the point of this website?

> Poking fun at how ridiculous infosec clickbait vulnerability reporting has become lately. Just because it has a flashy website or it makes the news doesn't mean you need to care.


> So what's the point of this website?

> Poking fun at how ridiculous infosec clickbait vulnerability reporting has become lately. Just because it has a flashy website or it makes the news doesn't mean you need to care.

> If you've read all the way to here, congratulations! You're one of the rare people who doesn't just retweet based on the page title :-)


That's reassuring to read. I opened the page, read a bit of it, pressed play on the video and scrubbed around a bit, got irritated and closed the tab. I figured if it mattered I would wait until better coverage came out.


> It violates the OS security model. You're not supposed to be able to send data from one process to another secretly.

I'd argue this is not the case. What mainstream operating systems have made credible attempts to eliminate covert channels from eg timing or resources that can be made visible by cooperating processes across user account boundaries?


Indeed.

Without this vulnerability, there would still be a million ways to send data between cooperative processes running as different users on Mac OS X.

For example, a process could start subprocesses at a deterministic rate and the other end of the covert link observes how fast the pid counter is going up.

This is a non-vulnerability, because it targets something there was no effort to protect.


It's not really a vulnerability as the FAQ states, but it violates the operating system's own application isolation policies. If you don't want your Facebook app to talk to your Instagram app (e.g. different accounts for different purposes), you should be able, as a user, to block communication between the two. This is a backdoor to circumvent that.

I mean not that anyone has a native Facebook or Instagram app on their device, but just to name an example.


> I mean not that anyone has a native Facebook or Instagram app on their device, but just to name an example.

The M1 is used in the iPad Pro so your example is definitely possible. (or your comment was sarcasm in which case: woosh to myself)


> I'd argue this is not the case. What mainstream operating systems have made credible attempts to eliminate covert channels from eg timing or resources that can be made visible by cooperating processes across user account boundaries?

All of them.

A piece of software able to read my mail but not use the Internet could credibly be a tool to help me index and find my email using search keywords. It promises to not use the Internet, and indeed nm/objdump shows no use of networking tools.

Another piece of software able to monitor RSS feeds I am interested in and alert me to their changes is expected to use the Internet, but not the filesystem, and surely not the part of the filesystem that contains my email. I can use strace/dtruss to verify it never touches the filesystem, and use chroot/jail to keep it honest.

This being said, I agree that "mainstream operating systems" (meaning Windows and macOS, but not perhaps iOS) don't do enough and it might be impossible for them without changing user expectations[1], but I think they're trying. Web browsers disabled high resolution timers specifically to protect against this sort of thing. iOS doesn't permit arbitrary background tasks from running to protect battery and ostensibly privacy. But they could all do better.

[1]: For example, for me high CPU load is a red flag - a program that does this to me regularly gets put into a VM so that I can mess with its time-- Zoom now loses about a minute every three if it's not focused which is annoying because it messes with the calendar view, but I'm pretty sure it can't do anything else I don't want it to. Who should do this work? My operating system? Zoom? Neither will do it if users don't demand it.


So my point as it applies to this example: the email indexing program could communicate towards the rss program using cpu or storage load spikes. And no widely used multitasking OS tries to prevent this.


Yes, exactly. Multics actually tried, here's a memo from 1974 discussing the issue: https://multicians.org/mtbs/mtb696.html

Paged shared libraries, signalling by ramping up and down CPU usage, there are an enormous number of possible covert channels.


> What mainstream operating systems have made credible attempts to eliminate covert channels from eg timing or resources that can be made visible by cooperating processes across user account boundaries?

The answer will depend on whether you consider Multi-Level Security (MLS) https://en.wikipedia.org/wiki/Multilevel_security "mainstream". It's certainly a well-established approach if only in an academic sense, and the conflux of new use cases (such as secretive, proprietary "apps" being expected to manage sensitive user data) and increasingly-hard-to-mitigate info disclosure vulnerabilities has made it more relevant than ever.


ELI5, anyone.

Are the chip registers not protected? What's the mechanism that's allowing this data sharing to happen?


There's two bits of a CPU's register that are shared between all of its processes and that any process can write to. The result is that two sandboxed processes that are supposed to be totally isolated from each other can use this to communicate anyway. One example of how this can be exploited is cross-app tracking: if you told one app your name and another your location, they could secretly communicate with each other so both apps end up with both pieces of information.


>they could secretly communicate with each other so both apps end up with both pieces of information

The could also just both ping a server to exchange data.


One has access to the internet, the other has not (but has less info).


Why couldn't a future OS update add access control to these registers?


Because the OS has no say. A running program issues an assembly instruction to the CPU to read or write this register, and the CPU complies.

For the OS to have a say, the CPU would need to provide a way where the OS tells it (usually by setting certain values in other registers) that the CPU should not allow access, at least under certain circumstances.

The article actually does go into certain situations where the access is more restricted (search for "VHE"), but also in how that does not really apply here.


The OS can scan the program for instructions that access these bits. If necessary on a per-basic-block basis.


Of course, this only works if you can't introduce new code without the kernel noticing.


Yes, you can introduce new code but the kernel should also watch for that (JIT compilation etc.) and check the resulting code. It's quite involved, and the whole process looks more like a sandbox or emulator, but it's possible.


Doing this performantly is going to be very prohibitive.


Perhaps (depends also on CPU support), but on the other hand: in today's world with untrusted apps, the kernel will have to do some sandboxing anyway.


Could the OS intentionally clear or write dummy data to the register instead?


No. The author explained why not:

> originally I thought the register was per-core. If it were, then you could just wipe it on context switches. But since it's per-cluster, sadly, we're kind of screwed, since you can do cross-core communication without going into the kernel.


You gotta access those bits though some instructions though. What if the command pipeline filters those instructions.


Can you elaborate what you mean? What is the "command pipeline" here?



You are working here with CPU registers. At this point the OS has no say, it’s a hardware bug. Not a particularly serious one though.


I didn't say the OS filters the pipeline. Modern CPUs have a lot of updateable microcode, including how it handles its command pipeline.


There is no indication that the M1 has updatable microcode, nor any other features that might allow such mitigation. (If it did, Apple would've fixed it; I did give them a 90 day disclosure warning and they're not lazy about fixing actual fixable bugs.)


Aw - that was what I was worried about - without updatable microcode :nuke:.


Modern x86/x64 CPUs. The M1 might not have updatable microcode.


Apple might consider microcode a vulnerability. Certainly a double-edged knife.


Because the CPU doesn't provide a practical means to do so.


Doesn't the kernel control CPU access?


There's more specific answers here, but in general the answer to this question is "only partly". The kernel is what initially gives your process a time slice on the CPU, by setting an alarm for the CPU to return control to the kernel at the end of the time slice, and then just jumping into your code. During your time slice, you can do anything you want to the CPU, and in general only interrupts (timer interrupts, hardware interrupts, page faults, etc) will cause the kernel to get involved again. There are some specific features that CPU designers add to give extra control to the kernel, but that's a feature of the CPU and it's only when the CPU has explicitly added that type of control.


> The kernel is what initially gives your process a time slice on the CPU, by setting an alarm for the CPU to return control to the kernel at the end of the time slice, and then just jumping into your code.

Somewhat critically, it will also drop down to EL0.


Registers aren't resources you access through syscalls, there's no way for the kernel to control them unless you're running under virtualization or the CPU architecture specifically allows access control for the register. (As the site notes, virtualization allows controlling access to this register)


Can kernel scan each page it maps as executable and return an error if it finds instructions interacting with the 'bad' register? Assuming the kernel requires executable pages to be read-only (W^X), this may even be doable (but probably very very slow).


> Assuming the kernel requires executable pages to be read-only (W^X)

Which macOS's kernel doesn't.


It does require that, but it allows flipping between RX and RW at will (for JITs), and the M1 actually has proprietary features to allow userspace to do this without involving the kernel, so the kernel couldn't re-scan when those flips happen (plus it would kill performance anyway).

Plus, as I said above, this is prone to false positives anyway because the executable section on ARM also includes constant pools.


Can't a MAP_JIT region be writable by one thread and executable by a different thread at the same time?


Ah, yes, I forgot about that. So indeed there is no non-racy hook point for the kernel to do such a check, even if it made sense and the RX/RW switch went through the kernel, which it doesn't.



That link confirms that it can:

> Because pthread_jit_write_protect_np changes only the current thread’s permissions, avoid accessing the same memory region from multiple threads. Giving multiple threads access to the same memory region opens up a potential attack vector, in which one thread has write access and another has executable access to the same region.


The kernel doesn't get a say in what instructions a userspace program can run, other than what the CPU is designed to allow it to control. The bug is the CPU designers forgot to allow it to control this one.


Apple could "mitigate" this by refusing to sign code interacting with s3_5_c15_c10_1, I guess.


Only on iOS. On macOS, JITs are allowed (as is ad-hoc signed code if you click through the warnings).

However, this would be prone to false positives, as constant pools are in the executable section on ARM.


Let's say someone submits a malicious keyboard with the bad instructions hidden in a constant pool.

Apple can't just scan for a bad byte sequence in executable pages because it could also represent legitimate constants used by the program. (not sure if this part is correct?)

If so, doesn't that make detection via static analysis infeasible unless LLVM is patched to avoid writing bad byte sequences in constant pools? Otherwise they have to risk rejecting some small number of non-malicious binaries, which might be OK, depending on the likelihood of it happening.


Doesn't Rice's theorem mean that they cannot?


I believe that Rice's theorem is about computability, not about whether or not it is possible to validate which CPU instructions a program can contain.

With certain restrictions, it is possible to do this: Google Native Client [1] has a verifier which checks that programs it executed did not jump into the middle of other instructions, forbade run-time code generation inside of such programs, etc.

[1]: https://en.wikipedia.org/wiki/Google_Native_Client


Jumping in the middle of other instructions is not a problem on ARM.


Yes, but then you're not just blocking instructions that touch s3_5_c15_c10_1; you're also blocking a bunch of other kinds of instructions too.


(What other kinds of instructions? Genuinely asking.)

I don't think Rice's Theorem applies here. As a counterexample: On a hypothetical CPU where all instructions have fixed width (e.g. 32 bits), if accessing a register requires the instruction to have, say, the 10th bit set, and all other instructions don't, and if there is no way to generate new instructions (e.g. the CPU only allows execution from ROM), then it is trivial to check whether there is any instruction in ROM that has bit 10 set.

The next part I'm less sure how to state it rigorously (I'm not in the field): In our hypothetical CPU, I think disallowing that instruction either lets you remain being Turing Complete or not. In the former case, it's still the case that you can compute everything a Turing Machine can.


You'd have to add one extra condition to your hypothetical CPU: that it can't execute unaligned instructions. Given that, then yes, that lets you bypass Rice's theorem, even though it is indeed still Turing-complete.

But the M1 does have a way to "generate new instructions" (i.e., JIT), so that counterexample doesn't hold for it.


Yes, indeed, I should have stated "cannot execute unaligned instructions". Or have said 8 bit instead, then it would be immediately obvious what I mean. (You cannot jump into the middle of a byte because you cannot even address it.)

But I wanted to show how Rice's Theorem does not generally apply here. You can make up other examples: A register that needs an instruction with a length of 1000 bytes, yet the ROM only has 512 bytes space etc...

As for JIT, also correct (hence my condition), though that's also a property of the OS and not just the M1 (and on iOS for example, it is far more restricted what code is allowed to do JIT, as was stated in the thread already).


With the way Apple allows implementation of JIT on the M1 (with their custom MAP_JIT flag and pthread_jit_write_protect_np) it is actually possible to do this analysis even with JIT code. Since it enforces W^X (i.e. pages cannot be writable or executable at the same time) it gives the OS opportunity to inspect the code synchronously before it is rendered executable. Rosetta 2’s JIT support already relies on this kind of inspection to do translation of JIT apps.



M1 enforces W^X through SPRR, which does not involve the kernel.


It does when running native ARM code (but not x86 code), but AFAIK nothing stops Apple from changing this to being kernel mediated by updating libSystem in the ARM case as well. Of course I doubt they would take the performance hit just to get rid of a this issue.


There's three cases:

1) the program does not contain an instruction that touches s3_5_c15_c10_1

2) the program contains an instruction that touches s3_5_c15_c10_1, but never executes that instruction

3) the program contains an instruction that touches s3_5_c15_c10_1, and uses it

Rice's theorem means we cannot tell whether a program will touch the register at runtime (as that's a dynamic property of the program). But that's because we cannot tell case 2 from case 3. It's perfectly decidable whether a program is in case 1 (as that's a static property of the program).

Any sound static analysis must have false positives -- but those are exactly the programs in case 2. It doesn't mean we end up blocking other kinds of instructions.


Couldn't there be another register that controls whether access to the problematic register in EL0 is allowed, though?


Sounds like this is by design and not really a newly discovered vulnerability. Maybe more of a discovery of deceptive advertising/documentation? Which is to say that Apple's engineers are reading this as non-news.


There is a small bit of memory that all programs on your computer share that isn’t protected in any way. If two misbehaving programs on your computer wanted to communicate in a really really secret way, they could use it.

If you don’t have misbehaving programs on your computer that want to secretly communicate than it doesn’t matter.


> So what's the real danger?

> If you already have malware on your computer, that malware can communicate with other malware on your computer in an unexpected way.

> Chances are it could communicate in plenty of expected ways anyway.


> So what's the real danger?

> If you already have malware on your computer, that malware can communicate with other malware on your computer in an unexpected way.

> Chances are it could communicate in plenty of expected ways anyway.


Your computer might spontaneously combust.


Holy shit, just as I thought we’ve run out of novel ways of playing Bad Apple, here we are...



How about randomising/reset these bits from kernel whenever there is a syscall? Not a great workaround but this should limit the effectiveness of leaking. Yeah, there will be tiny perf hit due to extra register read and write.


> Wait, didn't you say on Twitter that this could be mitigated really easily?

> Yeah, but originally I thought the register was per-core. If it were, then you could just wipe it on context switches. But since it's per-cluster, sadly, we're kind of screwed, since you can do cross-core communication without going into the kernel. Other than running in EL1/0 with TGE=0 (i.e. inside a VM guest), there's no known way to block it.

In other words: this register is shared between cores, so if the two processes are running simultaneously on different cores, they can communicate by reading & writing directly to & from this register, without any operating system interaction.


Unfortunately, you can use this to send thousands of bits between syscalls, so the simplest error correction would fix that, with very little effort or overhead.


The demo already uses error correction (I'm not sure exactly what causes the errors, but I'm guessing the processes sometimes end up briefly scheduled on the other core cluster)


> in violation of the ARM architecture specification

> Apple decided to break the ARM spec by removing a mandatory feature

Is there a page documenting all incompatibilities / violations of the ARM architecture specification by the M1?


I wouldn't be surprised if the language gets loosened in the next revision of the standard, as it has in the past.


It seems like there's a partial mitigation available to the OS here. When scheduling a task, write a random value to the two user-writable bits. When the task is unscheduled, if the bits do not match, terminate the task. This effectively makes writing to the register an OS-enforced illegal operation with a 75% chance of being caught within 10 ms if the channel is being used at full bandwidth. (The writer can reduce the chance of it being caught proportional to reduced use of channel bandwidth by resetting it to the OS-chosen value after a bit is transmitted.) The reader can't be detected this way, but since the channel requires cooperation between the writer and reader, catching either is fine. Not a perfect fix, but would help, and would also give visibility into whether this is used in the wild -- e.g., report to Apple via crash reporting mechanism if a process is terminated this way, which would allow prompt discovery of app store apps that abuse the channel.


Could this be fixed with a microcode update (he asks, not really having any idea what microcode is)?


Does the M1 even have microcode updates? I haven't seen anything pointing to that yet.


I don't think so, no. If it has microcode it's probably burned into sequencer tables, not updatable. I was kind of hoping Apple would have some chicken bit register up their sleeve as a last resource fix (e.g. "trap on instruction encodings matching this mask"), but given that they seem to have no useful mitigation for it, I don't think they do.


Is it possible Apple have the silicon functionality to fix this, but have decided it isn't worth fixing?

After all, process isolation between cooperating processes is nearly impossible to do. If Apple close this loophole, there will be other lower bandwidth side channels like spinning up the fan in Morse code and the other process notices the clock speed scaling up and down...


It doesn't really make sense not to fix it if they can in fact do so easily.


Except software-silicon patches usually have a limited number of filters, patch slots, etc. Might not be worth using one for this.


They're using zero so far [0], and until they need it for something else it wouldn't make sense not to use it for this. The CPU tunables aren't fuses or anything, the OS configures them (m1n1 in our case)

[0] https://github.com/AsahiLinux/m1n1/blob/main/src/chickens.c


Sorry if I missed it, but what is the defined purpose of the s3_5_c15_c10_1 register? Or is it just general purpose?


It's an implementation-defined register, which means it's up to Apple to define it. We have no idea what it does; we haven't observed any visible effects from flipping those bits. Given that it's per-cluster, we can infer that it has something to do with cluster-specific logic. Perhaps memory interface or power control.

There are hundreds of Apple implementation-defined registers; we're documenting them as we learn more about them [0] [1] [2]

[0] https://github.com/AsahiLinux/m1n1/blob/main/tools/apple_reg...

[1] https://github.com/AsahiLinux/docs/wiki/HW%3AARM-System-Regi...

[2] https://github.com/AsahiLinux/docs/wiki/HW%3AARM-System-Regi...


I googled it for you and err, came up blank, there's just two code references in some ASM code, the rest points to this resource. Weird, I would have thought things like this would have public documentation.



It’s not a standard ARM register; it’s implementation-defined and nobody gave it a meaningful name. It appears to do nothing much.


Then why not unofficially standardize its use in GCC and Clang's register-pooling subsystem?


Because GCC and Clang don't get to decide what hardware does with the register…


> OpenBSD users: Hi Mark!

Ok I nearly fell out of my chair. A+


Heh, its "baked in" heh

In all seriousness, I wonder what the actual issue is.

Could anyone comment as to the implications of only supporting a Type 2 hypervisor that is (as said on the site) "in violation of the ARMv8 specification"?


The implications are just that OSes that assume otherwise won't run; Linux used to work (by chance) until a patch that just about coincided with our project went in that used the non-VHE ("type 1") mode by default, which broke it, and then we had to add an explicit workaround for the M1.

It's just a very unfortunate coincidence that precisely that support would allow this bug to be trivially mitigated on Linux. (Wouldn't help macOS, as they'd have to implement this from scratch anyway; it's just that existing OSes that support this mode could use it).

The actual issue is just what I described: the hardware implementation of this register neglects to check for and reject accesses from EL0 (userspace). It's a chip logic design flaw. I don't know exactly where it is (whether in the core/instruction decoder, or in the cluster component that actually holds the register, depending on where they do access controls), but either way that's what the problem is.


One implication is it prevents solutions to things like this.

This one is a minor side note but there could be other vulnerabilities that could be resolved if the specifications were followed (I assume).


You can still solve the issue in VHE mode, since you can still implement a Type 1 hypervisor in VHE mode. It's just that, well, nobody does that, because why would they? That's what non-VHE mode is for.

So it's not that not following the spec prevents the workaround, it's just that had they followed the spec it would just take a single kernel command line argument (to force non-VHE mode) to fix this in Linux, while instead, now we'd have to make major changes to KVM to make the non-VHE code actually work with VHE, and really nobody wants to do that just to mitigate this silly thing.

Had this been a more dangerous flaw (e.g. with DoS or worse consequences), OSes would be scrambling to make major reworks right now to mitigate it in that way. macOS would have to turn its entire hypervisor design on its head. Possible, but not fun.


Are there any legal ramifications for them not following the spec?


Do Apple actually maintain a list of errata


Internally they certainly do.


I couldn't find one from search and this Linux kernel documentation: https://www.kernel.org/doc/html/latest/arm64/silicon-errata.....


I also had a look a while back but I was hoping they had one for Apple devs or whatever.

Some documentation would be nice...


Actually not a bad song [0]. Thanks to whoever made this, I guess :D

[0]: https://www.youtube.com/watch?v=i41KoE0iMYU


I had to use that one for this demo for obvious reasons, but if I'm allowed the shameless plug, I actually make my own music in the same genre (Touhou rearrangements) [0]. I'm actually very much looking forward to moving my music production to M1 and seeing what the real-time performance is like, though that will depend on us having at least a usable Rosetta-like thing on Linux to run x86 Windows apps (which will allow me to bridge the few x86 Windows plug-ins I rely on with yabridge, as I do today on x86) :-)

[0] https://www.youtube.com/playlist?list=PL68XxS4_ek4afs0eXwRiY...


Wow I can't believe I just realized why Bad Apple is used for this, 2-3 hours after reading the article…


That's awesome! I'm definitely thinking about getting an M1 for realtime keys, though I'm all set up in Logic/MainStage so I'll probably stick with macOS for now :)

On the Linux side, would qemu user-mode emulation work for that (maybe with a patch to take advantage of the M1's switchable-memory-order thing)?


I think qemu would work fine, but it's pretty slow, so I'm hoping it can either be improved or another project more focused on this use case can do it better.

If nothing else though, I plan to expose at least the TSO feature of the M1 so qemu can reduce the overhead of its memory accesses.


Awesome! I'll take a listen and play it during my all-nighter tonight. Best of luck getting that M1 Mac working, I'm really liking mine so far!


It seems like a single bit available to all apps but that no one is really using now. I wonder if a easy software mitigation could be just polluting it intentionally.


Given M1 is in iPad Pro now, I think there could will be Apps seriously exploiting this to circumvent Do No Track in iOS 14.5.


Thankfully, Apple should be able to statically analyze apps to look for this on App Store submission, as the App Store does not allow dynamic code (JITs).


How? They’d have to communicate with an app that does have those permissions


At the core, DO NOT TRACK prevents Apps having access to the Advertising Identifier. So different Apps cannot aggregate their analytics data about the users.

This vulnerability enables different Apps to communicate a super cookie for cross-app tracking. A possible exploit would be to implement this feature in an AD SDK to be used by different Apps.


But does it actually do anything? Apps can surely identify users by other means, e.g. IP, behaviour...


Marcan's posts, comments, and now websites are always fun to read, and this one is no exception.


Not to miss his YouTube channel where he does live hacking sessions. Invaluable for beginners.

https://www.youtube.com/watch?v=hLQKrEh6w7M


I love the two puns in the title: "M1RACLES: Bad Apple!! on a bad Apple (M1 vulnerability)"


well this is a new genre of satire


Sounds like a killer feature. They're going to announce this at wwdc21.


No definitely a big, but I expect that they will announce the swarm of nanobots they plan to release to fix all the affected chips.


Is the risk that malicious software can be split into multiple, obfuscated components?

Without such a silicon vulnerability the malicious process would need all its components within a single process/image?


"So you're telling me I shouldn't worry? Yes."


Security through obscurity + walled garden. Brilliant move.


Out of curiosity, which public IRC channel was this being discussed in, before it was understood to be a bug? That sounds like a fun channel.


Ah, looking more into it, I'm going to guess it was #asahi or #asahi-dev.


Yup, here's the log if you're curious :-)

https://oftc.irclog.whitequark.org/asahi/2021-02-24#29220558


I love the choice of music. Very appropriate.


> Am I affected?

> • OpenBSD users: Hi Mark!

Yes, "Hi Mark", whoever you are and no matter that I'm not an OpenBSD user.


Context: Mark Kettenis has been porting OpenBSD to the M1 SoC, and is probably the only current user of that code.


Precisely :)


That was my initial suspicion as well.

If you are the author, thanks for the read.

Twitter handle matches.


I took it as a reference to The Room. Isn't it? :|


I'm not aware of the referenced material. :)


Oooo! Depending on your taste you're in for either a very boring movie or the experience of a lifetime. Complete with an associated rabbit hole of mystery surrounding the director: https://en.wikipedia.org/wiki/The_Room

If you do end up watching and liking it, there is book and a film about the filming of the film to enjoy :D https://en.wikipedia.org/wiki/The_Disaster_Artist



Thanks both of you for the info.


Real? I'm confused by a particular infosec quote.


I wish all CVE were written this way :-)


That demo made me vomit


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: