Hacker News new | comments | ask | show | jobs | submit login
Virtio-fs: shared file system for virtual machines (kernel.org)
224 points by diegocg 9 days ago | hide | past | web | favorite | 60 comments





Some background reading that may make this help makes sense to folks from outside the domain: https://www.ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf

Block storage (virtio-block) is way simpler than file systems, both in the virtio implementation and in the host filesystem implementation. Simple is good, both for security reasons and because it makes it easier to reason about performance isolation between guests. The cost is obviously in features, and in making it hard to share data (and very hard to multi-master data) between multiple VMs. To see how simple virtio-block is, take a look at Firecracker's implementation (https://github.com/firecracker-microvm/firecracker/blob/mast...).

The existing alternative, as the post talks about, is 9p which has a lot of shortcomings, so this is interesting. I suspect it's use is mostly going to be in development environments and client-side uses of containers, and server-side users of virtualization will likely stick with block (or filesystem implemented in guest userland).


Is the use-case here really related to that of block storage?

I imagine a given pool of blocks is reserved for one of VM at a time. I wouldn't want to share the storage live between two VMs unless it was being accessed by some application specifically designed to hit a shared block store.

The point of something like virtio-fs would be to allow multiple VMs to share a file tree. It is a compromise between bind mounds (which are simpler, but only work for containers) and network filesystems (which work remotely, but are less efficient and well behaved).


The point is to avoid the overhead of the hypervisor emulating a network and network device and the VM operating them.

Thanks, for the link to the paper. Even though I knew the basics, this paper is really well written and makes a good job of explaining the details.

> I suspect it's use is mostly going to be in development environments and client-side uses of containers

Containers don't need this. It's only useful for virtual machines.


The envisioned use case is Kata Containers https://katacontainers.io (formerly "Clear Containers"), which behaves like a container runtime but is actually a virtual machine with its own kernel to enhance isolation from the host. Since it runs its own kernel, it needs something like this to access files on the host.

The advantage compared to running normal VMs is that it's interoperable with things that expect a container runtime like Docker or rkt - e.g., Kubernetes can run Kata Containers - and the overhead / density / startup speed is much closer to that of actual containers (in the sense of namespaces+cgroups) than traditional VMs.


And the advantage compared to running containers is...?

> a virtual machine with its own kernel to enhance isolation from the host

Whether that’s something you care about or think is necessary is left as an exercise to the reader (i.e. it depends).


Too short text for that topic. Container world certainly needs a lot of storage problems resolved. So, why do containers not need this? What part of it do containers not need?

Containers have no access to virtio, so they can't use this. It's that simple.

Containers are not virtual machines.


That's one way to look at it, but with sufficient effort on the quality of the host filesystem, you can offer better reliability, and offer features that are expensive/difficult to do properly from inside a VM. Streaming transaction logs and snapshotting (and if you're adventurous, deduplication), for example.

The email mostly talks about performance, but could this technique finally give us VM shared folders with proper inotify support (at least on Linux hosts)? That has always been my number one gripe with using Vagrant for local development.

I sure hope so! I'm encouraged to see a project potentially tackling this need. The current "best practice" to use a unison synched directory between the host and your VM (and even then maybe to a docker volume bind mount) is very error prone and comes with many pitfalls.

Mine as well, I have this horrible NFS setup that kinda/sorta works but this (if what you query is the case) would be awesome.

I even looked at writing my own 'vagrant' (to fit exactly my use case of Linux on Linux via KVM and deployed via ansible) because the NFS thing and VirtualBox issues mar what is otherwise a great idea.


Vagrant is quite extensible; you can plug in KVM via QEMU and Libvirt as a VM "provider", and use Ansible to provision the guest out of the box.

vagrant-libvirt (3rd party): https://github.com/vagrant-libvirt/vagrant-libvirt

ansible provisioning (built in): https://www.vagrantup.com/docs/provisioning/ansible.html

This isn't to say these will make the guest filesystem more performant than NFS; you'll probably hit the same bottlenecks with mounting host filesystems into the guest. The solution many users arrived at with Vagrant is to use a user-space synchronization program like Unison that watches for filesystem events (eg, inotify) on both the guest and the host. This is its own can of worms: O(size of files monitored) resource use on both hosts, file ignore lists, async races, etc. There's even a Unison plugin for vagrant, but I found it quite finicky and thus unsuitable for large-scale deployment

unison: https://www.cis.upenn.edu/~bcpierce/unison/

unison-vagrant2: https://github.com/dcosson/vagrant-unison2

At Airbnb, we eventually wrote our own wrapper around Unison to make it easier to use, and saw 90x better perf for filesystem-heavy loads like Webpack once we switched off of NFS in our local VMs.


Thanks for the info, I was aware of vagrant-libvirt and ansible (already use that to deploy vagrant guests) but unison I hadn't seen.

Funnily enough webpack is where I had the most problems (it certainly surfaces "what happens when we smack the FS in the face repeatedly" issues better than most) which is what got me thinking down the KVM route in the first place.

In terms of hacking around on vagrant, it's ruby which isn't on the list of languages I'm fluent in and its flexibility makes it more complex than what I need for my use case.

There is something to be said sometimes for rolling your own for your own itch but I'll certainly have a look at unison as part of vagrant first.


> At Airbnb, we eventually wrote our own wrapper around Unison to make it easier to use

Is there any chance you guys open sourced this wrapper, or have any plans to? :)


Unlikely.

1. I no longer work at Airbnb, so I can't help you there.

2. The wrapper is somewhat deeply embedded in the dev tools swiss army knife, and is written in Ruby, so (a) it's difficult to extract since its tied with other Airbnb specific concerns and support libraries, and (b) distributing perf-sensitive Ruby code to end-users is challenging so you might not want it anyways.

You might want to take a look at Mutagen, a unison replacement written in Go which seems to avoid most of what makes Unison annoying to work with. I haven't tested it, though, so I can't vouch for it.

https://github.com/havoc-io/mutagen


That makes sense. I hadn't heard of Mutagen before, but it seems like it would solve a lot of the issues that drove me away from Unison. Thanks for mentioning it!

Here’s a similar project we made for syncing stuff to remote docker hosts:

https://github.com/crgwbr/accordance


Very cool. I always thought a node/TypeScript implementation would be easier than our Ruby one because of the natural event-orientation of Node.

Vagrant-lxc works for us. File sharing is a bind mount so no problems with inotifu.

Did it work well for you out of the box, or did you have do some configuration early on? I haven't gotten this to work for me since I upgraded LXC to 2.0 (something about the `lxc-create` command failing IIRC), but I never spent much time digging into it, so it's entirely possible it's something to do with my setup and not vagrant-lxc.

Same here, near-native performance is quite critical for certain use cases, including kernel compiling. If you ever tried to compile from source over NFS, that's almost 10~30 times slower than compile from a native/local filesystem.

Does Vagrant have its own file sharing system, or does it use whatever the backends (mount for lxd, 9p for kvm, proprietary kernel module for virtualbox etc) happen to provide?

It uses different technologies depending on the used provider. For instance Virtualbox shared folders, NFS, SMB or rsync'd folders.

I understand that the Kata folks want this and why. Given what Google is doing with Crostini and the overlap between Kata/Firecracker/crosvm, I'm curious what ChromeOS engineers have to say about virtio-fs.

(And while I'm being curious of those folks, what their expectations are around virtio-wl going forward, upstream, etc?)


I often have the problem that I am in a VirtualBox VM and want to get a file to the host.

I never found a solution other then to upload it somewhere on the internet and then download it on the host.

Yes, I know you can probably configure something before starting the VM to make VM and host share files.

But I usually want to have them completely sepereated. Only when the need arises to transfer a file, I would like to do so ad hoc.

Is there a solution?


You can add shared folders at the host side at runtime and then mount them in the guest. Or you can expose network filesystems (CIFS, NFS) on either side and mount on the other.

As I understand it, host and guest can not see each other on the lan. So I think neither is possible.

Change the networking config from NAT to bridged. The VM will now appear as a truly independent device on your LAN, as if it were plugged into a switch.

But then it can talk to the lan. I don't want that. The whole reason I have some apps running in a VM is to be shielded from them.

Then you need to add a port forward to get through the NAT. However, NAT isn't security and you might be better off using a bridge and a firewall to shield your VM.

virtualbox adds a network interface to connect host and VM to each other, you can reach them via private IPs. the exact IPs used depend on the type of network mapping used (NAT, host-only, bridged)

That depends on how you configure the networking for the VM.

Yes, I stated that in my original question already.

Not clearly IMHO. If the VM being able talk to the host is not acceptable to you, even if you only activate that during runtime when you need it, then scp obviously isn't a solution.

If the file is saved on a kind of disk, and not only in-memory on your VM, it is already on your host in the disk image of the VM, you just extract it. With qemu I usually loopback mount in readonly mode the disk and access whatever I want from the VM guest that way.

VirtualBox creates .vdi files. Looks like there is no easy way to mount them.

If you go this way, why not simply use SCP on the host?

Where would I scp to?

You open a shell on the host, and then `scp guest:path/to/file target/path/to/file`.

What is 'guest' here? As far as I know, the VM does not have an IP on the lan.

I mean, you need to be able to use your VM Manager tool to find out what is the IP/hostname of your guest.

A Default VirtualBox configuration does not allow direct host-> guest access, it's on some hidden VB-only NAT. You need to change the Network Interface type to Bridged or Host-only or something else to have access to the VM.

I haven't done it for some time, but in my memory it was neither a problem on OSX nor on Linux with VBox. Maybe try Vagrant? I don't remember, sorry. But ask around and you should find a super easy way to make this happen.

SCP from Guest to Host

What is the ip of the host? I am pretty sure the guest is seeing a lan that only contains itself.

It sounds like you are talking about NAT mode on virtualbox. In this case you can set up port forwarding in the network settings. Also, you can actually reach the host from the guest. In my case my guest has an IP of 10.0.2.15 by default, and I can reach the host on 10.0.2.2

Any chance of getting a kext for this filesystem on MacOS, if it's the host? I'm still searching for the holy grail for using containers in development, and the 9p implementation used by most docker-on-mac setups is extremely slow.

In theory file system drivers for this can be implemented for any operating system that allows third-party file systems. virtio-fs is based on FUSE, so a starting point would be existing macOS FUSE implementations (I haven't checked the status) and then adding the virtio device plumbing.

I have not tried it yet but docker-sync is said to solve the problem of slow io.

This is very exciting. Once it lands, I'll change virtme [0] to use this when available.

[0] https://github.com/amluto/virtme


How does this compare to Gluster?

GlusterFS is a distributed storage system. virtio-fs is a shared file systems for virtual machines.

You can expose GlusterFS directory trees to virtual machines using virtio-fs but virtio-fs itself doesn't do the network communication or policy decisions about where and how data is stored on the network - that's the job of GlusterFS.

The reason you might want to use both together is to hide the details of the GlusterFS configuration from the virtual machine. That way the virtual machine only sees a virtio-fs device and doesn't have network access to your GlusterFS cluster.


Isn't the big difference that all your clients have to be on the same physical hardware as your server?

And that makes virtio-fs a lot simpler than gluster. It doesn't have to deal with split brain, replication, etc..

I wonder why they don't use syncthing.

Can't we modify syncthing so that it becomes like a low latency shared filesystem and uses bitttorent as an underlying protocol.

Or is the bitorrent not fit for making a shared/distributed file system?


> Can't we modify syncthing so that it becomes like a low latency shared filesystem and uses bitttorent as an underlying protocol.

Well... at that point you're writing new software, so you might as well start from the ground up with something actually designed for this use case, which provides support for things like live coherency, writeback and caching.

It's not easy to just make a program "become" low-latency. Syncthing is extremely high-latency to begin with, and is specifically not meant to be used as a real-time sync application which multiple applications can depend on for live coherency. This is the design of Syncthing. Same for bittorent. Tons of overhead due to the design of the system. For something like this which requires true low latency, there are just too many design decisions which would get in the way.

One major pain point about Syncthing in particular is that there isn't a very good way to maintain complete consistency between different OS environments. Syncing files from Linux which are not valid in Windows (name, metadata, etc) either causes filesystem corruption, reduces coherency between guests and host, or straight just drops incompatible files.

Try copying a file from Linux to Windows which has a trailing space. Windows will go bonkers and cannot delete or modify this file. You have to move it to a partition which you then delete.

I was so frustrated with using Syncthing that I gave up trying to make it useful.


You are implying that you want to store the data twice on the host and the guest. The proposed virtio-fs (and also the existing virtio-9p) is about passing through a subtree hierarchy stored on the host filesystem into the guest.



Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: