
QEMU VM Escape - ngaut
https://blog.bi0s.in/2019/08/24/Pwn/VM-Escape/2019-07-29-qemu-vm-escape-cve-2019-14378/
======
stefanha
QEMU developer here. Some context on the impact and the security architecture
of QEMU:

1\. Production VMs almost exclusively use tap networking, not slirp. This CVE
mostly affects users running QEMU manually for development and test VMs.

2\. Slirp
([https://gitlab.freedesktop.org/slirp/libslirp](https://gitlab.freedesktop.org/slirp/libslirp))
is part of the QEMU userspace process, which runs unprivileged and confined by
SELinux when launched via libvirt. To be clear: this is not a host ring-0
exploit!

3\. Getting root on the host or accessing other VMs requires further exploits
to elevate privileges of the QEMU process and escape SELinux confinement.

More info on QEMU's security architecture: [https://qemu.weilnetz.de/doc/qemu-
doc.html#Security](https://qemu.weilnetz.de/doc/qemu-doc.html#Security)

For a more detailed overview of how QEMU is designed to mitigate exploits like
this, see my talk from KVM Forum 2018:
[https://www.youtube.com/watch?v=YAdRf_hwxU8](https://www.youtube.com/watch?v=YAdRf_hwxU8)
[https://vmsplice.net/~stefan/stefanha-kvm-
forum-2018.pdf](https://vmsplice.net/~stefan/stefanha-kvm-forum-2018.pdf)

~~~
justinclift
> confined by SELinux

Only on platforms that use SELinux though.

~~~
stefanha
For Ubuntu users there is AppArmor support in libvirt too:
[https://wiki.ubuntu.com/LibvirtApparmor](https://wiki.ubuntu.com/LibvirtApparmor)

QEMU runs on other operating systems like *BSD, macOS, and Windows. It is less
mature on those platforms and it's safer to avoid running untrusted VMs on
those platforms.

------
userbinator
Looks like this was something that QEMU inherited when it took code from
[https://en.wikipedia.org/wiki/Slirp](https://en.wikipedia.org/wiki/Slirp) .
Even after reading that page, I'm still not sure how it works; is it like a
SOCKS proxy?

~~~
kijiki
Slirp is a little like NAT, but implemented differently.

It creates what looks like a virtual NIC (literally in qemu's case, indirectly
by SLIP in the original Slirp), and reassembles the packets it gets coming in
from the guest OS or SLIP user. For example, a SYN packet gets turned into a
call to connect(), and a data packet gets turned into a write() on the
appropriate TCP socket FD. An RST packet gets turned into a call to close().
The reverse happens in the other direction, based on the read() data, fake TCP
packets are generated, a closed socket gets mapped to a RST packet, and so
forth.

From the outside, it looks like the guest (or SLIP user) is NATed through the
host's IP address. But really it is just reassembling the intention of the
guest based on the packets it is sending and calling host kernel functions to
cause the same effects.

~~~
ptman
Is that similar to what sshuttle does?

~~~
dahfizz
Not really. Sshuttle is really just a convenient wrapper around plain old ssh.
It created an SSH tunnel to the host you specify and forwards all traffic
through that. Its more like a VPN

------
megakluntjes
As far as I know many projects (Podman, Virtualbox, Rootless Docker, Usernetes
etc) use a fork of slirp (e.g. slirp4netns).

Let's hope that these projects are not affected, too

~~~
AkihiroSuda
slirp4netns v0.2.3, v0.3.2, and v0.4.0-beta.3 are already patched for this
CVE.

[https://github.com/rootless-
containers/slirp4netns/security/...](https://github.com/rootless-
containers/slirp4netns/security/advisories/GHSA-gjwp-vf65-3jqf)

Also, v0.4.0-beta.2+ can harden its own process by unsharing mount namespace
and pivotting_root to an empty dir that only contains /etc and /run with
noexec mount option. v0.4.0-beta.4+ additionally supports seccomp filters.

------
chrissnell
It’s kind of amazing that SLiRP is still in use. Back in 1995, I was working
tech support at an ISP and we used to kick people off of shell accounts all
the time for using it to get a cheaper dialup connection. It was usually the
MUDers (the bottom of the barrel of internet users in ‘95).

~~~
quickthrower2
Why was a shell account cheaper?

~~~
mlyle
Shell accounts were cheaper to provide because people spent most of their time
doing things that were purely local-- news, mail, etc-- so the bandwidth per
user needed was trivial.

While users using Moasic/Netscape on dialup were likely to be pegging their
modem link the whole time, and almost all of those bits were bits you needed
to buy from a transit provider over an expensive leased line.

~~~
ryacko
Could a user install anything on those shell accounts?

~~~
mlyle
Generally you'd have a small disk quota, and you'd get yelled at /
disconnected if you used too much CPU or RAM, and you'd not be allowed to run
processes while you're not dialed up. But other than that, you could compile
and run stuff like slirp, sure.

Then ISPs didn't like people running SLIRP, so often there were "no SLIRP"
rules. :P

------
segfaultbuserr
If I read it correctly, the attack has three steps:

1\. Exploit a miscalculated pointer to write arbitrary data.

2\. Exploit a ASLR infoleak to figure out the target.

3\. Use (1) to create a fake timer with a callback to "system()".

Can some forms of Control Flow Integrity mitigate this type of attacks?

W^X is useless in this case, but if Control Flow Integrity offers code pointer
target verification, it could have a chance to catch the final bogus callback,
am I correct?

~~~
bonzini
Unfortunately all shared library entry points have to be marked as possible
destinations for indirect jumps. However, both seccomp and SELinux can block
the execve system call.

------
tus88
> which is a pointer miscalculation in network backend of QEMU

Out of interest does Rust prevent this kind of mistake?

~~~
geofft
Yes.

At a surface level, the bug is in doing raw pointer arithmetic: determining
the size of some value by subtracting two pointers from each other,
incorrectly assuming that both pointers are within the same object. There's a
codepath where one is not, and therefore this size computation is incorrect.
Later, that size is added to another pointer, allowing for out-of-bounds
access, overwriting other variables.

Rust doesn't let you subtract two pointers from each other. Even unsafe Rust
does not; you'd have to cast the pointers to integers, first, because finding
the difference between two unrelated pointers is a fundamentally meaningless
operation. (Indeed, it's undefined behavior in C, and Rust compiles through
LLVM and would inherit the same optimization passes that wish to consider
things UB, so it doesn't pass a request through to LLVM that's going to be
undefined.) And safe Rust doesn't let you index to an arbitrary spot in an
array / buffer without a bounds check, so even if you got a nonsense offset,
it would crash instead of overwriting unrelated values.

At a slightly higher level, it seems like the underlying issue here (if I'm
reading the article right) is that struct mbuf has two ways of representing
the data: the array member m_dat and the pointer m_ext. Which one you're
supposed to use is represented by a flag. The code correctly kept track of
which one to use in all cases _except one_. Entirely apart from the memory
safety stuff, Rust gives you tagged enums (enums with data, aka "sum types" in
functional programming) with the property that you can only access data inside
a particular enum variant if the variable you're looking at is actually of
that variant. So, for instance, you could have something roughly like:

    
    
        enum MData {
            Internal(buffer: [u8; 32]),
            External(ptr: &[u8]),
        }
    

and syntactically there's no way to get a ptr out of an Internal or a buffer
out of an External, so you couldn't have the logic confusion that led up to
the memory unsafety. Even if you could do raw pointer arithmetic in Rust,
you'd still get it right:

    
    
        let delta = match mbuf.data {
            Internal(buffer) => q - &buffer,
            External(ptr) => q - ptr,
        }
    

so it's impossible to forget to check the flag. (In this case they do check
the flag but it sounds like they're not checking the _right_ flag or
something? Or the flag is set too early? I don't totally follow the
description, but if it's something like that, using a Rust enum would
guarantee that the "flag" accurately matches whatever you're looking at.)

The memory safety stuff is great, but I really think that having a richer type
system like this is more fundamentally what prevents bugs, compared to C where
all you have is numbers, pointers, structures, and structures-where-things-
overlap. (Another good use of this is nullable pointers that force you to do
null checks before dereferencing them, and a little more broadly, this pattern
also gives you locked data that forces you to take the lock before
dereferencing the data, avoiding issues where you take the wrong lock, which
could end up as memory unsafety eventually.)

FWIW there are a few hypervisor projects in the same space as QEMU that are
written in Rust: AWS's Firecracker and Chrome's crosvm come to mind.

~~~
tsimionescu
> The memory safety stuff is great, but I really think that having a richer
> type system like this is more fundamentally what prevents bugs

I know this is only tangentially related to your point, but all the research I
have read about points to the opposite conclusion - richer and stricter type
systems don't have a proven effect on bugs, whereas memory safety is
guaranteed to eliminate whole classes of bugs.

One guess why this would be true, despite your good example of a type system-
level fix that would have prevented this bug entirely, is that memory safety
is automatic (or at least opt-out), while the richer types are opt-in: nothing
in Rust (or Java etc) would have prevented writing the original C, the
programmer would have had to think about using the enum or other equivalent in
order for the compiler to help. Granted, in this case they would almost
certainly have done so, as enums are just so much nicer than flag checking,
but for other type-level solutions the same may not happen.

~~~
geofft
Interesting, I'm very curious if you have pointers to this! Intuitively I feel
like I write better code with a better type system, but that argument makes
sense to me :-(

One thing I'm curious about is if there's a slightly different property than
"richer type system" that helps. In Rust but not necessarily in other
languages with good type systems, you can't leave a struct member
uninitialized, so perhaps the easiest way to write this code is actually to
use an enum if you know you'll only ever use one or the other. That might be
close enough to the "automatic" property?

------
sansnomme
If the VM is ran in a hypervisor, will this be able to break out of the
hypervisor?

~~~
darren0
Yes but slirp isn't used in a production grade setup. It's more of a simple
slow default that's guarantees to work, so it's used mostly in development
setups.

~~~
bonzini
Note that since the last version of QEMU (4.1, released a few days ago) slirp
has been moved out into a separate library hosted at
[https://gitlab.freedesktop.org/slirp/libslirp](https://gitlab.freedesktop.org/slirp/libslirp).

There is interest in using it in container runtimes as well, and hopefully
this will give slirp more love. It's very old code that most QEMU developers
wouldn't have touched with a ten-foot pole...

~~~
rtpg
>Testing

> Unfortunately, there are no automated tests available.

> You may run QEMU -net user linked with your development version.

Not trying to shame anyone here, but I do think that this would be a good
entry point for anyone trying to help this project out. Hell, this might be a
fun little target for property testing or other kinds of generative testing

------
ncmncm
I don't see why they even bother to reassemble the packet fragments. If
they're well formed, pass 'em along!

~~~
auscompgeek
You wouldn't be able to simply pass IP packets with usermode networking. It
may also be a better use of bandwidth to reassemble and refragment the packets
if the MTU of the host is smaller than the MTU advertised to the VM (although
I suppose the hypervisor should try to match the MTU).

