I was confused by the title, by "Ring-2" it means "Ring -2" (minus two), which is "traditionally" SMM (System Management Mode), a horrible relic that lets your BIOS/UEFI silently steal the CPU from the OS to implement janky drivers or workarounds directly in the firmware (occasionally causing all sorts of mayhem).
(Actual Ring 2 is very rarely seen, so perhaps I should have known!)
Just terminology. Whenever a new higher-privileged entity is created, it is sometimes described as a negative ring.
There was a time when people thought if we could put the secure code in a lower ring, then with it we could protect the rest of the system. With virtualization, the hypervisor is in ring -1, which is technically not a ring, but rather a mode called VMX root operation, post-VMXON. This enables things like the blue pill attack, where the hypervisor is itself presented with a false image of the underlying physical hardware, by a malicious layer. You can find the same pattern in ARM TrustZone, where the secure code is repeatedly broken.
It's the same reason that the nominal thrust level on the Space Shuttle Main Engines is 104.5%[1].
No, not 100% (it was originally), 104.5%. Why? Because you don't go back and change all your rules and documentation following subsequent developments in the field, that causes unnecessary confusion and errors down the road.
I don't think there actually are, in the sense that there isn't a register somewhere with these values that gets compared against, the way there is with rings 0-3. I've only heard this in the context of reverse engineers describing the layers of access that undocumented parts of a modern CPU system have, I think it's just a made-up analogy. There is presumably some proprietary documentation out there with more official names.
Arm (Aarch64) Exception Level 0 corresponds to Ring 3 of x86.
Arm (Aarch64) Exception Level 1 corresponds to Ring 0 of x86.
Arm (Aarch64) Exception Level 2 corresponds to the Hypervisor level a.k.a. Ring -1 of x86.
Arm (Aarch64) Exception Level 3 corresponds to the System Management Mode a.k.a. Ring -2 of x86.
Fortunately, in Arm EL3 the same instruction set is used as in any other level, unlike in x86, where SMM uses the obsolete 16-bit 8086 ISA, so for compiling programs that will be executed in SMM you have to use a special tool set.
Unfortunately, both the Arm EL3 and the x86 SMM allow the manufacturers of computing devices to do things that are either stupid or in direct contradiction with the interests of the owners of the devices and the owners may not be able to do anything to correct this, unless they can exploit vulnerabilities like the one that has now been patched by AMD.
There are no valid arguments for the existence of SMM and EL3 and the fact that they are not forbidden by law is a disgrace for the computing industry.
Arm EL3 has been created as an imitation of the Intel SMM. The Intel SMM has been created because Microsoft was too lazy to introduce the required power management functions in the Windows and MS-DOS operating systems, so they passed the task to the motherboard or laptop manufacturers, for which Intel has provided SMM, to enable this.
Ring -1 is the host system / virtual machine manager when the ring 0 OS is running as a VM. Ring -2 is more privileged than that since it can interrupt Ring -1 and can affect the execution of VM instructions.
Rings 1 and 2 are still very much present in your desktop x86 machine; Your OS just doesn't use them. X86-S will remove them, but no CPUs implement that reduced architecture, and Intel has made no public announcements about future generations that will.
Existing supervisors use 0, so when x86 virtualization was invented they added -1 for hypervisors. and so... are the monitors running on ring -2 ultravisors?
> Please refer to your OEM for the BIOS update specific to your product.
Unless running hardware also used by powerful hosting providers (some of which care for security), these mitigation will not reach many systems. Checked a few "client" samples, seems like MSI has provided updated binary blobs, ASRock has provided some, Gigabyte has provided broken ones first and then backdated the new ones, ASUS (ROG/RUF/CSM) and Biostar customers are still waiting.
The paper asks "why does this feature exist?" - probably they haven't gone far enough back in history (note I've worked on x86 clones I understand this stuff in far too great a detail)
Originally on x86 systems memory was in VERY short supply - SMM mode memory was the DRAM that the VGA window in low memory (0xa0000) overlaid - normal code couldn't access it because the video card claimed memory accesses to that range of addresses - so the north bridge when the CPU was in SMM mode switched data and instruction accesses to that range to go to DRAM rather than the VGA card .... that's great except remember that SMM mode was used for special setup stuff for laptops .... sometimes they need to be able to display on the screen .... that's what this special mode was originally for: so that SMM mode code can display on the screen (it's also likely why SMM mode graphics were so primitive, you're switching in and out of this mode for every pixel you write)
Sometimes it's nice to see SMP causing headaches for the "bad" guys for a change. They did eventually work around it, but half of this paper is working around problems where the second core gets out of sync and crashes as soon as they tried to exploit the system.
The most interesting part, to me, was that entering SMM pauses all cores at once, instead of doing the work in a single core like normal interrupts. That sounds like a performance killer, and I hope entering SMM is really rare in modern systems.
My information is pretty out of date, but when TPMs first arrived on the scene there was a fair bit of talk about using them as secure enclaves where you could do honest to god "trusted computing" with a fully verified stack on ordinary PC hardware. This largely didn't work out because TPMs were slow and every time you tried to do it you basically stalled out the rest of the machine, so once execution came back to the CPU everything was out of sync and all of the attached hardware like network cards and video cards crashed or froze. TPMs ended up only being useful as a place to store keys and occasionally cryptographically sign small amounts of data.
That said, the SMM can probably be a little less intrusive if it needs to be. Like it doesn't have to freeze the cores if all it is doing is reading your bitcoin addresses and passphrase out of memory, just stalling the memory bus for a moment or two.
Android pKVM hypervisor tries to constrain vendor-specific Arm EL3 TrustZone (~x86 SMM Ring-2) on Pixel 7/8/9, https://lkml.org/lkml/2022/11/16/1241
pKVM's primary goal is to protect guest pages from a compromised host by enforcing access control restrictions using stage-2 page-tables. Sadly, this cannot prevent TrustZone from accessing non-secure memory, and a compromised host could, for example, perform a 'confused deputy' attack by asking TrustZone to use pages that have been donated to protected guests. This would effectively allow the host to have TrustZone exfiltrate guest secrets on its behalf, hence breaking the isolation that pKVM intends to provide..
FF-A provides (among other things) a set of memory management APIs allowing the Normal World to share, donate or lend pages with Secure. By monitoring these SMCs, pKVM can ensure that the pages that are shared, lent or donated to Secure by the host kernel are only pages that it owns.. the robustness of this approach relies on having all Secure Software on the device use the FF-A protocol for memory management transactions with the normal world, and not use vendor-specific SMCs that pKVM is unable to parse.
> Because of its traditionally unfettered access to memory and device resources, SMM is a known vector of attack for gaining access to the OS and hardware.. One could have perfect code in SMM and still be affected by behavior like trampolining into secure kernel code.. Isolating SMM is implemented in three parts: OEMs implement a policy that states what they require access to; the chip vendor enforces this policy on SMIs; and the chip vendor reports compliance to this policy to the OS.
it's funny that they have to debunk the "root is root, why would AMD patch this" that goes around every time there's a serious issue that allows guest-root escape from virtualized containers.
the same thing happened with the ryzenfall/masterkey exploit, where people were just in utter denial there was an actual exploit there, because root is root! People literally spent more time talking about who released it and their background image than the actual exploit. AMD obvious cannot have exploits, that's only an intel thing. /s
PS: they did release technical details once the mitigations had been released etc. And these were released to tech researchers earlier, and proof of concepts were shown etc. https://youtu.be/QuqefIZrRWc?t=1005
And, like, the fact that AMD released an urgent patch for it should kind of speak to the severity of the issue in the first place. AMD doesn't patch "sudo lets you do root things", obviously, so it necessarily must have been more than that, and this was obvious even at the time. But we have to go through this dance with literally every single AMD exploit.
AMD has a unpatched exploit in all Zen3 and below processors that leaks data from kernel at a faster rate than meltdown did. It was discovered by the same researchers that discovered meltdown. AMD has chosen to leave that unpatched, and put out a weaselly deflection about "it doesn't cross address boundaries" but they also still refuse to turn KPTI on by default because it would hurt their benchmarks. And without KPTI there is no address boundary to cross, that's the weaselly part. AMD very craftily made it sound like they're saying there's not an issue, but in fact they are fully confirming the finding from the researcher, including the suggested mitigation (enabling KPTI), they just don't recommend that you do it. The statement is deliberately short to avoid inclusion of too many details that might dispel these misleading impressions.
This follows that same researcher (who previously discovered meltdown) uncovering a prior series of vulnerabilities in the cache ways predictor that also nullify KASLR... which AMD refused to patch because it "didn't leak actual data, only metadata"... the metadata being the page-table layouts. That one is still unpatched too - as the researchers note, AMD never actually mitigated this either, just more weasel words.
(this one literally doesn't even seem to have a security bulletin page for itself so I guess they have fully shoved this one down the memory hole now, but here's the news item from wayback) http://web.archive.org/web/20200325045817/https://www.amd.co...
After 6+ years of watching the community defend this behavior, downplay exploits from their favorite megacorporation, etc, it just gets old. Not liking how CTS labs did it or whatever is fine. It doesn't mean there's not a serious exploit, and so often that is where people end up with these AMD exploits, they like AMD so much that they argue against the existence or significance of the exploit, attack the researchers or whine about research grants, etc.
"Does this really deserve this CVE score" is a constant refrain in AMD vuln threads and it just gets so old. As tptacek noted... intel ME vulns are frontpage news and have people asking where they can buy a processor without ME in it. Literally nobody cares that AMD has had these vulnerabilities left open and unmitigated for years and years even though they're actually worse (as judged by the researcher who found both these issues and meltdown).
People would have flipped the fuck out if Intel left meltdown unpatched and released misleading statements implying that it wasn't an issue etc. It is wild just how much AMD is playing on story-mode difficulty with the average enthusiast, and honestly most people don't even realize they're doing it. And that drives me nuts - just decide if security issues are a problem or not, and if the answer is "not" then let's just turn all the mitigations off and see how long they remain un-exploited. If we want to have the security version of the drug-assisted olympics then fine, there is value in having dragsters that just do the thing as quickly as possible, right? But the double-standard people apply to anything AMD is crazy. Talk about your "tyranny of low expectations".
(Actual Ring 2 is very rarely seen, so perhaps I should have known!)