
Virtio-fs: shared file system for virtual machines - diegocg
https://lore.kernel.org/lkml/20181210171318.16998-1-vgoyal@redhat.com/
======
mjb
Some background reading that may make this help makes sense to folks from
outside the domain: [https://www.ozlabs.org/~rusty/virtio-spec/virtio-
paper.pdf](https://www.ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf)

Block storage (virtio-block) is way simpler than file systems, both in the
virtio implementation and in the host filesystem implementation. Simple is
good, both for security reasons and because it makes it easier to reason about
performance isolation between guests. The cost is obviously in features, and
in making it hard to share data (and very hard to multi-master data) between
multiple VMs. To see how simple virtio-block is, take a look at Firecracker's
implementation ([https://github.com/firecracker-
microvm/firecracker/blob/mast...](https://github.com/firecracker-
microvm/firecracker/blob/master/devices/src/virtio/block.rs)).

The existing alternative, as the post talks about, is 9p which has a lot of
shortcomings, so this is interesting. I suspect it's use is mostly going to be
in development environments and client-side uses of containers, and server-
side users of virtualization will likely stick with block (or filesystem
implemented in guest userland).

~~~
ori_b
> _I suspect it 's use is mostly going to be in development environments and
> client-side uses of containers_

Containers don't need this. It's only useful for virtual machines.

~~~
geofft
The envisioned use case is Kata Containers
[https://katacontainers.io](https://katacontainers.io) (formerly "Clear
Containers"), which behaves like a container runtime but is actually a virtual
machine with its own kernel to enhance isolation from the host. Since it runs
its own kernel, it needs something like this to access files on the host.

The advantage compared to running normal VMs is that it's interoperable with
things that expect a container runtime like Docker or rkt - e.g., Kubernetes
can run Kata Containers - and the overhead / density / startup speed is much
closer to that of actual containers (in the sense of namespaces+cgroups) than
traditional VMs.

~~~
ori_b
And the advantage compared to running containers is...?

~~~
littlekosh
> a virtual machine with its own kernel to enhance isolation from the host

Whether that’s something you care about or think is necessary is left as an
exercise to the reader (i.e. it depends).

------
RussianCow
The email mostly talks about performance, but could this technique finally
give us VM shared folders with proper inotify support (at least on Linux
hosts)? That has always been my number one gripe with using Vagrant for local
development.

~~~
noir_lord
Mine as well, I have this horrible NFS setup that kinda/sorta works but this
(if what you query is the case) would be awesome.

I even looked at writing my own 'vagrant' (to fit exactly my use case of Linux
on Linux via KVM and deployed via ansible) because the NFS thing and
VirtualBox issues mar what is otherwise a great idea.

~~~
jitl
Vagrant is quite extensible; you can plug in KVM via QEMU and Libvirt as a VM
"provider", and use Ansible to provision the guest out of the box.

vagrant-libvirt (3rd party): [https://github.com/vagrant-libvirt/vagrant-
libvirt](https://github.com/vagrant-libvirt/vagrant-libvirt)

ansible provisioning (built in):
[https://www.vagrantup.com/docs/provisioning/ansible.html](https://www.vagrantup.com/docs/provisioning/ansible.html)

This isn't to say these will make the guest filesystem more performant than
NFS; you'll probably hit the same bottlenecks with mounting host filesystems
into the guest. The solution many users arrived at with Vagrant is to use a
user-space synchronization program like Unison that watches for filesystem
events (eg, inotify) on both the guest and the host. This is its own can of
worms: O(size of files monitored) resource use on both hosts, file ignore
lists, async races, etc. There's even a Unison plugin for vagrant, but I found
it quite finicky and thus unsuitable for large-scale deployment

unison:
[https://www.cis.upenn.edu/~bcpierce/unison/](https://www.cis.upenn.edu/~bcpierce/unison/)

unison-vagrant2: [https://github.com/dcosson/vagrant-
unison2](https://github.com/dcosson/vagrant-unison2)

At Airbnb, we eventually wrote our own wrapper around Unison to make it easier
to use, and saw 90x better perf for filesystem-heavy loads like Webpack once
we switched off of NFS in our local VMs.

~~~
RussianCow
> At Airbnb, we eventually wrote our own wrapper around Unison to make it
> easier to use

Is there any chance you guys open sourced this wrapper, or have any plans to?
:)

~~~
jitl
Unlikely.

1\. I no longer work at Airbnb, so I can't help you there.

2\. The wrapper is somewhat deeply embedded in the dev tools swiss army knife,
and is written in Ruby, so (a) it's difficult to extract since its tied with
other Airbnb specific concerns and support libraries, and (b) distributing
perf-sensitive Ruby code to end-users is challenging so you might not want it
anyways.

You might want to take a look at Mutagen, a unison replacement written in Go
which seems to avoid most of what makes Unison annoying to work with. I
haven't tested it, though, so I can't vouch for it.

[https://github.com/havoc-io/mutagen](https://github.com/havoc-io/mutagen)

~~~
RussianCow
That makes sense. I hadn't heard of Mutagen before, but it seems like it would
solve a lot of the issues that drove me away from Unison. Thanks for
mentioning it!

------
colemickens
I understand that the Kata folks want this and why. Given what Google is doing
with Crostini and the overlap between Kata/Firecracker/crosvm, I'm curious
what ChromeOS engineers have to say about virtio-fs.

(And while I'm being curious of those folks, what their expectations are
around virtio-wl going forward, upstream, etc?)

------
TicklishTiger
I often have the problem that I am in a VirtualBox VM and want to get a file
to the host.

I never found a solution other then to upload it somewhere on the internet and
then download it on the host.

Yes, I know you can probably configure something _before_ starting the VM to
make VM and host share files.

But I usually want to have them completely sepereated. Only when the need
arises to transfer a file, I would like to do so ad hoc.

Is there a solution?

~~~
Cei5ouko
You can add shared folders at the host side at runtime and then mount them in
the guest. Or you can expose network filesystems (CIFS, NFS) on either side
and mount on the other.

~~~
TicklishTiger
As I understand it, host and guest can not see each other on the lan. So I
think neither is possible.

~~~
Unklejoe
Change the networking config from NAT to bridged. The VM will now appear as a
truly independent device on your LAN, as if it were plugged into a switch.

~~~
TicklishTiger
But then it can talk to the lan. I don't want that. The whole reason I have
some apps running in a VM is to be shielded from them.

~~~
Spivak
Then you need to add a port forward to get through the NAT. However, NAT isn't
security and you might be better off using a bridge and a firewall to shield
your VM.

------
ninkendo
Any chance of getting a kext for this filesystem on MacOS, if it's the host?
I'm still searching for the holy grail for using containers in development,
and the 9p implementation used by most docker-on-mac setups is extremely slow.

~~~
stefanha
In theory file system drivers for this can be implemented for any operating
system that allows third-party file systems. virtio-fs is based on FUSE, so a
starting point would be existing macOS FUSE implementations (I haven't checked
the status) and then adding the virtio device plumbing.

------
amluto
This is very exciting. Once it lands, I'll change virtme [0] to use this when
available.

[0] [https://github.com/amluto/virtme](https://github.com/amluto/virtme)

------
brightball
How does this compare to Gluster?

~~~
geofft
Isn't the big difference that all your clients have to be on the same physical
hardware as your server?

~~~
mistaken
And that makes virtio-fs a lot simpler than gluster. It doesn't have to deal
with split brain, replication, etc..

------
sexyrouter
I wonder why they don't use syncthing.

Can't we modify syncthing so that it becomes like a low latency shared
filesystem and uses bitttorent as an underlying protocol.

Or is the bitorrent not fit for making a shared/distributed file system?

~~~
kakarot
> Can't we modify syncthing so that it becomes like a low latency shared
> filesystem and uses bitttorent as an underlying protocol.

Well... at that point you're writing new software, so you might as well start
from the ground up with something actually designed for this use case, which
provides support for things like live coherency, writeback and caching.

It's not easy to just make a program "become" low-latency. Syncthing is
extremely high-latency to begin with, and is specifically not meant to be used
as a real-time sync application which multiple applications can depend on for
live coherency. This is the design of Syncthing. Same for bittorent. Tons of
overhead due to the design of the system. For something like this which
requires true low latency, there are just too many design decisions which
would get in the way.

One major pain point about Syncthing in particular is that there isn't a very
good way to maintain complete consistency between different OS environments.
Syncing files from Linux which are not valid in Windows (name, metadata, etc)
either causes filesystem corruption, reduces coherency between guests and
host, or straight just drops incompatible files.

Try copying a file from Linux to Windows which has a trailing space. Windows
will go bonkers and cannot delete or modify this file. You have to move it to
a partition which you then delete.

I was so frustrated with using Syncthing that I gave up trying to make it
useful.

