Hacker News new | comments | show | ask | jobs | submit login
QEMU: virtfs permits guest to access entire host filesystem (chromium.org)
132 points by remx on Feb 28, 2017 | hide | past | web | favorite | 43 comments

The v9fs code has been a major source of bugs. Hopefully no one's using that in production...


IIRC it was implemented to drive fast vm bootup/filesystem passthrough on some distributed HPC environments at IBM: https://landley.net/kdocs/ols/2010/ols2010-pages-109-120.pdf

While of course not as fast as a block device mapping, the reduction of copies (virtfs provides zero-copy data passthrough) makes virtfs considerably faster than NFS or CIFS from the guest. This means that even if you're pointing the virtfs at a network mount, you'll still see a speed improvement from the reduction in copies needed to get data on/off the wire.

Of course, the security model is dependent on getting the programming right, this is why the most common qemu execution paths use selinux/apparmor to implement access control on top of just the virtualization.

If you want a real fun mind bender, try making qemu-KVM work from inside docker, without running a full privileged container. It's doable, but something of a challenge. FWIW, QEMU itself doesn't need root, only /dev/kvm access. (Which has also needed security attention in the past).

In terms of speed sensitive HPC workloads though, I bet v9fs is definitely used in production. Hopefully those people are also careful enough to use mandatory access control to sandbox QEMU, since thats definitely not a default libvirt-style setup.

Surprisingly, when I tried it, NFS outperformed virtio-9p handily across the board. I really was not expecting that result. Perhaps there is a way to tell the host to assume the VM is the only writer to that the exposed path, so the client can do efficient caching?

I think most people when running in production wind up using network based protocols instead, NFS, SMB, etc which are much more security and time tested.

I've never heard of anyone using this kind of thing outside of personal VMs or development.

v9fs has some major advantages - the protocol is much simpler and more lightweight, which makes accelerating it over virtio much more tractable and effective. And it's nicely integrated into qemu so you don't need to bother with setting up a server.

Really, it's not competing with NFS/SMB, it's just aimed at fast passthrough for local filesystems.

However, having read a decent amount of the code myself, it is pretty crappy/sketchy.

However, it's not a lot of code and the basic idea is sound and the protocol is simple and well designed, it really wouldn't take someone with the right skills long to rototill it and make some drastic improvements (there's performance gains to be had in there, too).

Couldn't one start by linking a scrubbing bridge for the protocol in Rust and grow from there?

I'd like to be able to use Rust in QEMU one day, but just at the moment it doesn't support all the platforms QEMU does. The sheer range supported by a standard GCC C toolchain is very hard for a new language to catch up to.

(It mostly has support for the CPUs we support and for the OSes but not always in all combinations, eg no ARM FreeBSD support. It also sometimes doesn't support the range of OS versions we do, eg no OSX 10.6 support, no support for Windows before Windows 7. And some things are only in Rust's tier 3, like OpenBSD support. We'd also need to dump some of the "probably nobody's using this really" older platform support code, like ia64 or Solaris.)

Oh, and Rust would need to be in the stable/LTS releases of distros before we could use it. That's going to take time to percolate through.

I'd imagine the way you'd use Rust for the use case described is to have it be a library for certain protocol implementations, and leave the bulk of the application in C. Then you use very little of the standard library, and you don't call the platform APIs directly from Rust. Over many years you might port all of QEMU eventually, but over many years Rust platform support will improve too.

For things like ARM FreeBSD support, given that Rust already supports ARM non-FreeBSD and FreeBSD non-ARM, I think the blockers are likely to be random #defines for system calls (unless the FreeBSD folks are using a different userspace ABI on ARM than, e.g., Linux on ARM). Those are interesting if your goal is to reimplement QEMU in Rust, but they're not interesting if your goal is to reimplement the 9P protocol in Rust.

It looks like Windows XP is pretty well supported, and the primary reason it doesn't have official support is that the OS itself is out of support? (You can also take sitkack's suggestion of just not enabling the security-focused code on a no-longer-security-supported OS.) Rust's standard library doesn't work on OS X 10.6 because of a change in TLS / threading models in 10.7, but again, if Rust isn't opening threads or allocating memory and is leaving that all to C, you don't care.

Does the QEMU project treat OpenBSD, ia64, and Solaris as effectively more supported than what Rust calls tier 3? Do you do automatic builds on those platforms? If I report a bug that shows up only on those platforms, are developers likely to have access to such systems to reproduce them? If not, having a tier 3 build of QEMU depend on a tier 3 build of Rust seems totally fine.

> It looks like Windows XP is pretty well supported, and the primary reason it doesn't have official support is that the OS itself is out of support?

Yes, this is accurate. It's a "best effort" kind of thing; the OS doesn't have certain primitives that would be needed for a full implementation of libstd; IIRC concurrency primitives are the main culprit?

If the bridge/proxy scrubbed the protocol, it could be opt in on the platforms that Rust supports. Unsupported platforms of course would still be vulnerable. But security isn't all or nothing. It really is too bad that the C backend to LLVM fell into disrepair, it would be so useful.

I see this pattern where an Open Source project supports so many platforms and old versions of operating systems that security bugs become pervasive for a variety of reasons, 1) no knowledge of old platform, 2) hard for new devs to test 3) piles of ifdefs make code complex, filled with magic. There are probably at most handfuls of users on those niche platforms.

Maybe it would be nice to freeze a build environment in a container, snapshot the repo and make a legacy source archive should someone want to support Beos or OpenVMS in the future.

The idea of Rust in QEMU excites me a lot.

> The sheer range supported by a standard GCC C toolchain is very hard for a new language to catch up to.

Agreed. We're working on it, but it'll take time. If you have any thoughts on prioritization here, we're always trying to figure that out.

> no support for Windows before Windows 7.

To be clear, this is "before Vista", and we support compiling _to_ XP, but not XP hosts.

It's under 7k lines of code, and what it needs to be doing is pretty straightforward. I'm very much hoping to be able to use Rust in the kernel someday, but really it's overkill for this.

In this case, the bug is a logic error, not a resource-related error. I have hard time to see how Rust would have helped.

> However, having read a decent amount of the code myself, it is pretty crappy/sketchy.

Have a look at vvfat.

We used it for security reasons, as we didn't want the guests to have a network link to the host. I know, I know.

Virtfs is a cluster fuck. Using rsync over it creates hundreds of thousands of file handles that never close. Reported to Ubuntu months ago, nothing fixed. Red Hat had the better idea of going nowhere near it. Canonical produce so much shovelware that they don't support, I won't get bit by this again.


Yet another example where SELinux could have mitigated this effect.

Strictly, that's not true. The exploit here is access to the filesystem environment of the host process from the guest. That happens entirely within the Qemu process and is invisible to SELinux.

What you're saying, I think, is that the SELinux could have been used to encapsulate the host process from the rest of the system. Which is true, but sort of specious. Traditional unix security paradigms (e.g. run every guest instance in its own UID, whatever) would have worked fine to mitigate the damage too.

I think you are mistaken. Anything syscall-related is seen by SELinux which may accept or deny the request. Checking file access is one of the major use of SELinux. libvirt comes with the appropriate integration with SELinux (and Apparmor) to ensure that a QEMU process won't be able to read or write a file it shouldn't (and to avoid spawning process, transitioning state or poking into the memory or disk of a neighbor QEMU).

SELinux/Apparmor defeat most of the latest vulnerabilities in QEMU.

You're talking about SELinux within the guest then? The exploit requires kernel execution, so it's not relevant. Anyone capable of running this would be capable of evading LSM hooks by definition.

Outside the guest, sure, you could use SELinux to firewall VMs from access to anything more than a minimal filesystem. But you could use any of a zillion other techniques too. It's a specious point not relevant to this exploit, which certainly cannot have been prevented by SELinux.

I am talking on SELinux in the host which would totally prevent this exploit as SELinux would not allow you to browse the host. You seem to say that nobody does that. libvirt uses SELinux out of the box to do exactly that.

You're talking past me and missing the point then. SELinux would absolutely not prevent this exploit: it allows the guest access to the filesystem seen by the host process, full stop. You're talking about deployment policy[1] that might help to mitigate VM exposure. And that's a fine idea. But it's not equivalent to saying that SELinux could have prevented this, and saying so in a security context is misleading users.

[1] Which has to be developed, after all. Simply saying you're "using SELinux" does nothing to prove to me you have no vulnerabilities or mistakes in your rule set.

Title says "virtfs permits guest to access entire host filesystem". First answer says "Yet another example where SELinux could have mitigated this effect". You say "That's not true".

"Entire host filesystem" is "/" and anything below, right? If your QEMU process is running in a SELinux context, there is no way it could access the "entire host filesystem".

And as your footnote, a mistake in SELinux configuration is highly unlikely. Once your process is running in a context, you have to have an explicit transition to the unconfined context or you have to relabel the "entire host filesystem" with the context of the QEMU process.

Maybe? I am not convinced. I have used qemu (with virtfs) in my interactive session to test possibly suspicious 3-rd party apps, running as my regular interactive user.

With this vulnerability, the app might have escaped the sandbox and added evil stuff to my .bashrc. The SELinux would be totally useless there.

In other circumstances, the damage would be much more localized -- I mean, if you run qemu as daemon, you are likely doing it as non-privileged user anyway, so even regular unix permissions would be effective.

Anyone using qemu has disabled 9p for a really long time.

No one using qemu in production with untrusted VMs should be using 9pfs, but does anyone except RHEL/CentOS actually leave it out of the binary?

How do I disable it or is it disabled by default?

It is a device that you have to add explicitly to the virtual machine.

Semi hopping on the QEMU bashing train but recall Google ripped it out for GCE.


Well, Google rightfully said that GCE does not need most QEMU features, so they have made a simplified version. Is is sad they did not release it.

On the other hand, there are people who need QEMU features: we have a QEMU-based test script for boot images which actually uses a few more exotic features to simulate the target environment.

The Google version is supposedly coupled heavily to their infrastructure and therefore would give little benefit to opensource without also open sourcing a large portion of their infrastructure.

> without also open sourcing a large portion of their infrastructure.

Well, that’d not necessarily be bad for the dev community.

Red Hat also rips virtfs out of QEMU in RHEL, for what it's worth.

Exactly. Ubuntu didn't and it wasted weeks of development for me.


Ripped it out implies that at one point they used it.

No, it means they run binaries compiled with source code from which this feature set has been removed.

That's not a correct interpretation either; it's noted in the linked article that GCE uses a proprietary VMM instead of QEMU which itself was "ripped out" of the standard KVM+QEMU arrangement.

I probably would have chosen gentler phrasing.

From the title I understood this was a feature that had arrived in QEMU, seems like it could have it's place on development machines/etc where the only reason you're using a VM is to get access to some alt. architecture.

Um, silly question. Does this affect people using QEMU to run ReactOS, FreeDOS, and their proprietary counterparts to play old games and use old programs that don't work as well in DOSBox?

Neither ReactOS nor FreeDOS nor their proprietary counterparts have drivers for virtio-9p, so most likely you are not using the device and are not vulnerable.

9p as in Plan 9 from Bell Labs? I only use that from LiveCD - and mostly as a goof and a basic test for a machine.

If you run malicious binaries attacking this, sure.

If you're just playing old games, you are probably fine. (Likely wouldn't need to worry about other security exploits either for that case.)

well then, plan9 finally becomes useful for something!

Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact