Hacker News new | past | comments | ask | show | jobs | submit login
Attacking Firecracker: AWS' MicroVM Monitor Written in Rust (graplsecurity.com)
212 points by pentestercrab on Sept 8, 2022 | hide | past | favorite | 29 comments



This is a pretty good writeup of a long-fixed Firecracker bug (CVE-2019-18960).

Firecracker is a KVM hypervisor, and so a Firecracker VM is a Linux process (running Firecracker). The guest OS sees "physical memory", but that memory is, of course, just mapped pages in the Firecracker process (the "host").

Modern KVM guests talk to their hosts with virtio, which is a common abstraction for a bunch of different device types that consists of queues of shared buffers. Virtio queues are used for network devices, block devices, and, apropos this bug, for vsocks, which are a sort of generic host-guest socket interface (vsock : host/guest :: netlink : user/kernel, except that Netlink is much better specified, and people just do sort of random stuff with vsocks. They're handy.)

The basic deal with managing virtio vsock messages is that the guest is going to fill in and queue buffers on its side expecting the host to read from them, which means that when the host receives them, it needs to dereference pointers into guest memory. Which is not that big of a deal; this is, like, some of the basic functioning of a hypervisor. A running guest has a "regions" of physical memory that correspond to mapped pages in Firecracker on the host side; Firecracker just needs to keep tables of regions and their corresponding (host userland) memory ranges.

This table is usually pretty simple; it's 1 entry long if the VM has less than 3.5G, and 2 entries if more. Unless you're on ARM, in which case it's always 1 entry, and the bug wasn't exploitable.

The only tricky problem here for Firecracker is that we can't trust the guest --- that's the premise of a hypervisor! --- and a guest can try to create fucky messages with pointers into invalid memory, hoping that they'll correspond to invalid memory ranges in the host that Firecracker will deference. And, indeed, in 2019, there was a case where that would happen: if you sent a vsock message, which is a tuple (header, base, size), where:

1. The guest had more than 3.5G of memory, so that Firecracker would have more than one region table entry

2. The base address landed in some valid entry in the table of regions

3. base+size lands in some other valid entry in the table of regions

There are two bugs: first, a validity check on virtio buffers doesn't check to make sure that both base and base+size are in the same, valid region, and second, code that extracts the virtio vsock message does an address check on the buffer address with a size of 1 (in other words, just checking to see if the base address is valid, without respect to the size).

At any rate, because the memory handling code here deals with raw pointers, this was done in Rust `unsafe{}` blocks, and so this bug combination would theoretically let a guest trick Firecracker into writing into host memory outside of a valid guest memory range.

The hitch, which is as far as I know fatal: there's nothing mapped in between regions in x86 Firecracker that you can write to: between a memory region and the no-mans-land memory region outside it, there always happen to be PROT_NONE guard pages†, so an overwrite will simply kill the Firecracker process. Since the attacker here already controls the guest kernel, crashing the guest this way doesn't win you anything you didn't already have.

And now, post-fix, there's deliberately PROT_NONE guard pages around regions


The fact that this doesn't seem exploitable shows the value of defense in depth: although numerous safety measures were defeated, exploitation was ultimately blocked by a guard page. If that guard page hadn't been there, the outcome could have been very bad. Still, it got closer to exploitable than anyone is comfortable with.


Definitely. It could have very easily gone the other way - AFAIK that guard page was not to defend against this sort of issue. What's great is that Firecracker now does have explicit guard pages that they allocate in response to this, which to me indicates that they're not just a project that patches a vulnerability but thinks through how to protect against classes of vulnerability.


They do all sorts of things for security, which is one of the "tenets" (an Amazon thing) of the project. For instance: the reason we haven't had easy access to GPUs is that they don't fit easily into the Firecracker architecture.


> Currently, io_uring system calls are included in Firecracker’s seccomp filter. Because it redefines how system calls are executed, io_uring offers a seccomp bypass for the supported system calls. This is because seccomp filtering occurs on system call entry after a thread context switch, but system calls executed via io_uring do not go through the normal system call entry. Therefore, Firecracker’s seccomp policy should be treated as its union with all system calls supported by io_uring.

...

> Because of the nature of system call filtering via seccomp, io_uring still presents a major security disruption in sandboxing.

This is pretty interesting as io_uring has been seen a lot of press as the hot new thing.


The author of this also did a writeup of an io_uring exploit a while back that you might find interesting https://www.graplsecurity.com/post/iou-ring-exploiting-the-l... (the September 8 date is definitely wrong, probably an artifact from moving blogging platforms)


io_uring is seriously undercooked as an idea. It's had a ton of CVEs already, and the fact that you can tell the kernel to go off and do something for you later in a big red flag for secure system design. That kind of facility needs to be very carefully designed, but io_uring was just dashed off in a mad hack without really thinking it through.


TBH none of the kernel really gets designed with security in mind. It's a couple of people constantly trying to play catch up to get shit right where they can.


pity -- it's quite shiny from a distance.


Can't the kernel executor just check the seccomp rules when it pulls tasks off the iou queue?


There has been quite a bit of LWN coverage of io_uring security aspects, most recently https://lwn.net/Articles/902466/

The basic gist seems to be that it's just not currently the upstream developer's focus


Fundamentally, yeah sure, it can do whatever it wants. It just doesn't right now.


I'd love to see a move away from bpf hooks for security and ossify more of the key things as formal userspace API.


A lot of people have proposed using Rust for OS development. There are even plans to write Linux kernel modules in Rust.

I think this article is a very good demonstration of why Rust is not a silver bullet. It was created with userspace applications in mind and a system application is an entirely different beats.

Think about it this way: in C it is easy to shoot yourself in the foot. But in kernel space you can easily blow up the entire building.


It doesn't have to be a silver bullet to be a net improvement. There's a ton of other code in firecracker that is using safe Rust, and without that safety we could have had this bug and more.

This bug exists because it's exactly C-style code manipulating buffers using raw pointers without length instead of slices.


This bug is one of the exceptions that proves the rule about memory safety: it occurred because the KVM interface requires working in memory-unsafe interfaces. Raw pointers in idiomatic Rust code are practically FFI objects.

Almost nobody thinks Rust is a silver bullet, but eliminating memory-unsafety is an absolute good; the more of it we can get rid of in the kernel, the better off we'll be.


I kind of took the opposite away from this work. I think that after digging into Firecracker it was clear that exploitation of Rust code is extremely difficult.


Long story short: unsafe code can still be a source of vulnerabilities, even in a memory and thread-safe language. To me this sounds glaringly obvious.


tl;dr: The article describes the details of Firecrackers architecture and CVE-2019-18960, which (as you can imagine) got fixed long ago.


I was expecting a demo of an exploit, but what I got was code analysis and verbal handwaving. Anyone else feel like something was missing here?

Edit, I did learn cool new stuff tho, thanks.


Hi, author here.

I walk through the process of developing the exploit and primitives, and was upfront that I ran into a mitigation which thwarted my exploit strategy. Similar to other exploit writeups I've done, I try to focus on the big picture and illustrate the idea (through writing and diagrams) while still being technically rigorous. Exploit development is much more reading code than it is writing it.

If you have any suggestions for improvement, or want to tell me which sections felt like handwaving to you, please let me know! Better yet, if you have an idea on how to defeat the mitigation so I can complete the exploit, I would love to discuss it.

BTW: Failing to produce an exploit for a very powerful bug like this, despite my best efforts, was considered a giant win for the security review of Firecracker.


Thanks for sharing this writeup, Valentina!

>If you have any suggestions for improvement

Not GP, but I struggled to follow some parts due to passive voice.[0] There were a lot of sentences that omit the actor of the sentence, so I had a hard time understanding which component performed which action.

For example:

>If specified in flags, descriptors can be chained together with next containing the descriptor table index of the chained descriptor. virtio-vsock, buffers in a descriptor chain are used to construct a vsock packet. Something to note at this point: the buffer information in the descriptor comes from the guest, and it should be treated as untrusted.

So when I read that, I have to mentally walk back and figure out:

* can be chained together -> Who chains descriptors together?

* are used to construct a vsock packet -> Who constructs the packet?

* should be treated as untrusted -> Who should treat it as untrusted?

From context, I can figure these things out, but the Firecracker/kernel concepts you're explaining are occupying most of my mental bandwidth. Any bandwidth you can free up with simpler sentences makes it easier for me to focus on the main subject of the blog.

[0] https://writing.wisc.edu/handbook/style/ccs_activevoice/


Firstly thanks for sharing this with everyone.

I've read over this paragraph a few times now and I'm really struggling to see how their protections don't defend against the described issues -

"There are two problems that could occur; the base and result address may belong to two different regions, and the base address may not even exist in a valid region."

"the base and result address may belong to two different regions"

if addr >= region.guest_base && addr < region_end(region)

Surely region_end(region) stops it belonging to two different regions, as you're using one region in the for loop? I'm probably being thick!

"the base address may not even exist in a valid region."

Again surely if the addr < region_end(region) this would ensure it's within a valid region?

Is there any other info you can provide so that my simple brain can understand?


It looks like the author wasn't able to pull all the gadgets together into a working exploit, after finally being stymied by the fact that Rust surrounds the stack with guard pages (which are intended to catch accidental stack overflow, but fortuitously appear to also provide some protection against deliberate exploits as well). But it could have easily gone the other way, and exploits there might be still be possible (though obviously the code in question is many years out of date by now). It still serves to demonstrate the importance of auditing your unsafe blocks, the value of unsafe blocks in the first place (which is, I suspect, how this exploit was discovered in the first place), the value of additional tools to verify unsafe code (e.g. Miri, Kani), and the reason why Rust still goes to all the trouble of implementing runtime mitigations despite its memory safety guarantees.


Well, we attacked Firecracker and this is what we got haha not every attack is going to lead to a full end to end, reliable exploit, although we've posted those in the past too.

The key here wasn't to produce an exploit. That would have been interesting, but ultimately not the entire goal. The key was to understand "how do we use Firecracker in the safest possible way for our use case?". To do that we picked one of the CVEs that looked like it could be exploitable and dug into it.

We learned a ton about Firecracker and KVM and walked away with some mitigations we can implement such that even if the bug had been exploitable the attacker would have more hurdles to jump through. Specifically, we'll be working to harden the guest operating system such that the untrusted code will have a difficult time escalating to root/kernel, which is a prerequisite for this sort of attack.


> Firecracker is comparable to QEMU; they are both VMMs that utilize KVM, a hypervisor built into the Linux kernel.

That's not accurate: While KVM is mandatory for Firecracker, it isn't for QEMU.


It is accurate with a charitable reading, and not accurate with an uncharitable one.

> Firecracker is comparable to QEMU

This is charitably true. They can be used for similar high-level tasks, and thus they are reasonably comparable.

> that utilize KVM

Both qemu and firecracker use KVM, it happens that qemu supports other hypervisors/options.

"utilize KVM" does not have to be read uncharitably as _only_ uses that.


Ideally, the OP should have clarified it as "QEMU-KVM", since QEMU has a plethora of other use cases than a VMM (e.g. qemu-user, qemu atop Hyper-V/Xen/Hypervisor.Framework, cross-architecture emulation) that are more-or-less used just as often.

It shouldn't be a problem for already knowledgeable readers, but it _might_ give false impression that QEMU only works as a KVM-only VMM monitor to others.


The internet would be a friendlier place if people were more willing to give each other the benefit of the doubt.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: