Mounting a guest fs or syncing it to the host is rife with security footguns.
I've developed almost exclusively in VMs for over a decade. One reason I use VMs is to isolate the execution context from development and deployment. I used to passthrough from the host but poor performance and the lack of inotify are DX barriers. Passing through the FS to the host is a no-go because of subtle executable things like git-hooks that could enable sandbox escape.
The simplest and best approach I've found is to use a git remote on the host to push branches to/from the guest sandbox. I can commit on the sandbox fs and treat the host as an upstream remote. On the host I pull from the sandbox and push up to GitHub/etc. It's a bit more process but becomes second nature quickly and requires no extra tools. This also works well for remote servers.
Another approach I've used is lsyncd to sync files from the host to the guest (Mutagen is another cool syncing tool). In practice, though, I've found syncing to be a footgun too. It's too easy to edit a file on the host and blow away a change inside the guest with no undo. This is one reason I've found explicit git push / pull to be cleaner.
I find it exceptionally annoying to constantly commit and push, and entirely impossible if your IDE lives in the host.
I have started to use Syncthing to share the entire workspace between host and VM and it works great. It's near instant, and even works between Windows and Linux, and it's local sync.
Yeah the conflict files are no fun. You can, however, set up a sync location as "receiving only", which is neat if you have your IDE on the host and then just run things in the VM. Though I run it in 2-way mode because of the cache file syncing and such
Also, while I’m at it – if you’re using VMs for isolating potentially insecure things then your threat model is obviously different. I should have mentioned that I’m using VMs mainly for practical reasons, e.g. for easily spinning up Ubuntu environments with my preferred toolset. If you’re worried about something escaping the VM sandbox via git-hooks then your use case is of course very different.
My motivation for sandboxing is to fearlessly tinker with unfamiliar dependencies without thinking too deeply about the supply chain. Starting with isolation makes it safer to experiment, which is important to me for velocity.
Fair enough. I have clarified my use case at the top of my article to reflect my threat model. I don’t use the VM to run tricksy software I don’t trust so the sandboxing argument doesn’t really apply in my case (or rather it doesn’t apply any more than when running software directly on the host). All of the software I run on my VMs is the same software I would run on my host (if the host was Linux-based).
Likewise, fair enough. I appreciate you clarifying your threat model. I try to operate with low trust dev environments whenever possible, which is a major motivation for using VMs. It only takes one malicious npm or pip package for it to pay off.
Yeah, I hear you. I try to avoid pulling NPM packages that I don’t know so I’m not really interested in containing software for security purposes.
Of course, there’s also the question of dependencies. If I’m working on a project that I trust that pulls in hundreds of NPM dependencies, I implicitly trust that the project deemed those dependencies safe to pull in. It would be impossible to always operate on the assumption that you’re potentially pulling in hundreds of malicious packages so the chain of trust has to take over at some point.
However, I imagine you were more or less referring to singular projects that you don’t fully trust and like to experiment with in which case I agree with you and would probably use a VM without any kind of file sharing tbh.
An addendum: if you’re worried about the guest setting up a malicious git hook that your host would then execute, then your threat model is obviously very different from mine. Perhaps I should have clarified that I’m running code that I trust inside the VM so sandbox escape is not really a security issue in my case (but rather an operational one if, for example, a misbehaving script causes havoc on the host in this way).
For example, let's say I've got a git repo on a guest VM and I'm syncing the dir to the host. If I run `git push` on the host (because my ssh key is there), it implicitly runs any pre-push script dropped there by the guest, defeating any isolation.
Do you run an IDE inside the guest or the host (through something like VS Code Remote)?
Keeping the IDE inside the VM would feel more appropriate for your isolation, but would be a continual source of friction to keep IDE plugins/settings/tools in sync. Using a remote development workflow feels like it would eliminate many isolation guarantees.
PhpStorm has the Jetbrains Gateway which allows running the IDE on the VM and connecting to it using a client on the host. However, I found this to be a rather clunky approach when working on a local VM. I could imagine it working well if you’re developing on a remote machine but for local use cases this feels like an overengineered approach.
I mainly use GNU, but sometimes I develop with a VM running Windows. I mainly live inside Emacs. A nice way to access my Windows system is through TRAMP, which (T)ransparently extends the filesystem at an application level over various protocols. I have found that using Emacs as a Samba client to access Windows works quite well for me.
Unfortunately recent versions of Samba makes it hard (impossible?) to spawn remote processes; else I would be able to use eshell inside Emacs, which is pretty awesome over SSH.
I doubt the latest assertion, the impacket tooling is well known for executing code over various interfaces. I have no idea how it works, but I imagine it would be difficult to truely prevent exec over smb.
Fwiw though, windows now supports openssh via a powershell command to install (e.g it’s not some third party binary). I’m not sure if it supports sftp, but it may be worth looking into.
I did try using OpenSSH, but there were issues with using Powershell through eshell over TRAMP. It expects a POSIX shell. I also tried using Cygwin to run OpenSSH, but that didn't work well for my needs.
Tramps works well for simple use cases, like quick config edit or view some files etc. For full blown project development, I found numerous issues with tramp. Many plugins that work well with local files might not work well with remote files. This is not a fault of tramp though, just that not every plugin authors have time to support tramp well etc. At the end, I went with mosh + emacs setup which just works once you spend some time to setup[1] your terminal.
It’s more like emacs-flavored RPC. You can edit files on remote hosts using TRAMP, and you can transparently execute processes on the remote host. If you’re editing a remote file and invoke, say, a compiler it will transparently run the compiler on the remote host against the remote file.
Similar, yes. It supports many more protocols than SSH though[0]. It also supports multi-hops, even across multiple protocols if it can[1]. So you could say: access a remote server over SSH, and then pivot to a Windows machine behind a firewall using Samba, all by simply specifying a file path to do that.
TRAMP is general access via ssh, e.g. you can also use it for executing binaries on the remote machine. In the case of remote files it does something like `base64 -D <(ssh remote base64 remote-file) > tempfile` for read and the inverse for write.
I’ve used a VMWare Fusion shared folder between a Mac host and Linux/Windows guests for several years with zero issues. Performance has generally been very good, basic permissions work (haven’t needed anything beyond rwx), and I gain the ability to pretty freely move between host and guest with ease. For example, my Linux guest is headless, so I do most of my editing in a GUI editor or IDE on the Mac side on shared files.
I also have some little quality-of-life scripts on the Linux side for things like editing shared files in the Mac editor, opening shared folders in Finder, etc. These make it a fairly seamless experience to go back and forth between the two OSes.
Depends on your requirements, I guess. I wanted a solution that would allow me to work with Linux ACLs and symlinks as well as get me the best possible performance so going native was basically the only choice. I started from this and then basically just worked around this premise. The only alternatives here were 2 – file syncing or sharing a directory with the host.
> I’ve used a VMWare Fusion shared folder between a Mac host and Linux/Windows guests for several years with zero issues. Performance has generally been very good, basic permissions work (haven’t needed anything beyond rwx), and I gain the ability to pretty freely move between host and guest with ease.
Not for me! I did that and one day ran a Git command on the Linux side of a folder shared from the host.
It crashed Git!
This turned out to be reproducible. It was quietly corrupting my local Git repo too.
Turns out VMware Fusion shared folder has extremely bad incorrect caching in its Linux client, with no option to turn the caching off.
So I switched to using SMB and Samba to share a Linux guest folder with the Mac host. That mostly worked but for 9 months I found a random file would occasionally just disappear, without me noticing for a few days.
I kept losing data, about once a month at random. I thought there must be a subtle bug in my application's careful handling of those files. The concern delayed production deployment for 9 months. But no, when running on a proper Linux server my application code had been fine all along
It turned out to be the Mac SMB client had a an awful bug that caused it to very occasionally delete random files in busy directories. The files had no relation to files being operated on, except for being in the same folder and recently used. It wasn't an operation race, it was blatantly sending the wrong file id for an operation on a different file by another process in parallel. Debugging that was intense. For months I had no idea, and thought it was my fault I lost data about once a month. Eventually I managed to trigger the fault more reliably with an artificial load test, and then it took a solid day of tracing everything you can imagine to confirm the cause was the Mac SMB client sending an egregiously wrong command on rare occasions.
Because the shared folder random file deletions were so vexing, rare and distressing, I investigated all sorts of causes, and in the process found another. A sequence where Emacs (GUI Emacs on the Mac host) asks to confirm editing a file "locked" by another Emacs (not really, just due to stale state), where if I gave the wrong answer sequence and aborted at the wrong monent, and editor backups were configured a particular way, the file ended up removed unexpectedly. I figured: Jackpot! Found the cause of disappearing data due to a race in Emacs in rare circumstances. I thought I must have triggered that sequence a few times. So I fixed it in my Emacs with great relief.
Than a month later another file was missing and I realised there was a second cause of disappearing files. That turned out to be the crazy bug in the Mac SMB client.
I have shared folder solutions that seems reliable enough now, both for sharing host folders with the guest and guest folders wih the host. It involves NFS and duct tape with Samba to translate permissions, because Macos doesn't support NFSv4. And CIFS (SMB version 1, obsolete) because later SMB versions had some problem. It's not great. The permissions don't map well and I had to change the config last time I upgraded Macos version.
All because VMware Fusion shared folder had such poor cache coherency.
> I also have some little quality-of-life scripts on the Linux side for things like editing shared files in the Mac editor, opening shared folders in Finder, etc. These make it a fairly seamless experience to go back and forth between the two OSes.
I have little scripts like yours too. They're great! Being able to edit a file in the Linux guest using the Mac GUI version of Emacs is really helpful, as is being able to open a PDF or DOC or XLS that's stored in the Linux guest. That sort of thing is the motivation for sharing my Linux home folder with the Macos host.
Oh, I also use Emacs TRAMP in a way that's so seamless I forget I'm using it, When editing Linux guest files at the path where the Linux home folder is mounted.on thr host, Emacs is configured to do so transparently over SSH to the guest instead. It works so well I forget I'm using TRAMP, and this avoids weird file permission issues and potential bugs like those I've seen before when writing to the shared folder directly. It also means when I run a shell command or compile etc. from Emacs with a buffer open on a Linux file path, the command runs on the guest, which is perfect.
If it's still possible to get sshfs running on a mac, I'd suggest that might be a much better solution than samba or nfs, or any of the various other vm file sharing solutions. It's been a while since I used this setup on a daily basis but with cache disabled it was really dependable.
I’ll definitely have to try this. SSHFS didn’t work well for me when sharing files from the host to the guest but it might just work when sharing from the guest
Hard to trust this when the conclusion is to use NFS instead of VirtioFS. I don't buy that NFS over network, to a VM is faster than 9p, and certainly isn't faster than VirtioFS. Though "share from the VM" is good advice for NFS, since you can just reboot the guest when some part of NFS inevitably hangs (not-so-distant trauma here, including learning how to force power-off when shutdown is broken by NFS).
The point I was trying to make in the article was more about reversing the direction of mounts so that instead of mounting a host directory on the guest you share a directory from the guest and mount it with a client on the host. You can achieve this using whatever – NFS, Samba, etc. This is obviously not a new concept but I find it astonishing that perhaps the majority of articles on the internet always talk about mounting a host directory on the guest which is problematic due to the reasons I covered in my article.
The main advantage of this approach is that your code resides on a native file system. This allows you to take advantage of this performance gain where it matters most, which for me is during runtime (I develop web applications and IO performance is of paramount importance). I care much, much less about performance on the host side.
Sharing VirtioFS "chroots" that live on top of ZFS is golden to simplify backup strategies & reduce bloat.. while still being able to maintain workload and network isolation.
I'm not backing up entire VMs through ZFS snapshots nor VM snapshots, and don't need a ZFS-enables backup target either. I'm merely backing up the relevant files (e.g. compose stack definitions, config files, data files) that are in the chroot, from the VM host, to GDrive, using popular tooling such as duplicacy.
Also, regarding VirtioFS: this is indeed something I have not yet tried. I’m wary of this though because this requires installing additional drivers on Windows and I usually prefer using tools that are already at my disposal (e.g. Samba on Windows).
Not Windows specifically. I work on a macOS host so NFS works fine. However, I want to keep my options open for working on a Windows host if necessary and I wanted to keep to the toolset that already exists on the platform. I want to isolate as many responsibilities as possible inside the VM and then use a client on the host to mount the directory. Bonus points if the host already comes with this client out of the box.
I didn’t know about Guestfish, thanks for the reference!
Yes, virtio-fs is an option but I should have added that one of my goals was to not lock myself into a platform… I want to have the possibility of running my VMs on Windows as well and Samba seems like the easiest choice there. Just spin up a Samba share inside the VM, mount it on the Windows host and you’re good to go. No need to install any additional dependencies on the host.
I’ve tried the Windows NFS driver for Vagrant a while back and my conclusion is that it’s best not to force a square peg into a round hole… Not to mention that these custom solutions can experience software rot throughout the years.
I used to have this problem. Ended up running an Ubuntu desktop VM on the Windows machine. Developed and tested in the VM. Then I realized I didn’t need windows in the way (or spying on me) so I decided went native Ubuntu desktop.
This doesn’t work for me because I find the Ubuntu desktop experience to be quite horrible. It’s very janky way too often. I very much prefer working with headless Ubuntu instances since this is where Linux excels for my use cases. For GUI stuff I look elsewhere.
Given how prevalent remote editing workflows have become (VScode remote, I'm sure there are others), I 100% agree with this advice. If you're working on some projects on a remote VM, just store your project files there. It's of course critical to have a backup workflow. I have a bunch of development VMs on a baremetal hetzner proxmox instance, which is completely provisioned using Ansible.
Many years ago, I experimented with all sorts of syncing mechanism from my windows desktop to a remote linux VM and the sync lag was just unbearable. I'm much happier when the files live on the development VM.
I kind of went overboard and tried to plan for a disaster recovery plan for those development VMs, even though there's not really a very strict SLA required. I can provision those VMs using ansible, but any work in progress which wasn't pushed to github is at risk. To address that, I've got hourly rsync jobs of my home directory replicating to a rsync.net account in Zurich (why Zurich ? gives me the lowest latency from the hetzner server in Germany). I then asked rsync.net to enable 48 hourly snapshots on that account (with additional weekly, monthly). It has become essentially impossible to accidentally lose data and I feel very confident working for days on a git feature branch just committing locally, or perhaps not even committing, given i've got those hourly snaphots. My bare metal server could vaporize and i'd be back up and running in hours. Even if the hetzner datacenter was hit by a nuclear weapon, I would instantiate a VM on any cloud provider, provision it with ansible and be back up and working within hours.
I've spent WAY too much time working on my little disaster recovery scenario but it's been insanely fun.
I've been experimenting with working in a gcp VM for a few months and like it quite a bit. Between ssh+tmux+nvim and vscode-remote I have a remote and local dev exp that is basically identical.
Syncthing is my key to keeping my ~/git tree in sync between the two environments. Just have to be careful not to try to flip back and forth quickly between local/remote as syncthing delays can be ~10s.
I always had trouble when using Syncthing for .git folders: they usually end up corrupted / messed up if I push on both the local and remote .git folders.
I use PyCharm+Vagrant in my day job. Bidirectional syncing comes by default there and has been pretty seamless. I don’t usually need to create files in the VM but when I do having them appear on the host for inspection is great. I also haven’t noticed any performance issues with the setup compared to developing solely on the host.
Are you referring to bidirectional syncing built into Jetbrains IDEs? If so, I have mixed feelings about it.
I gave this setup a go with PhpStorm. Host-to-VM sync is pretty seamless but you always have to manually sync changes from the guest to the host, so there’s that. Additionally, syncing can be very slow on large directory trees. I often found myself staring at a loading animation waiting for the IDE to catch all the changes and sync them to the guest.
AFAIK it is not recommended to open a project located on a network drive in a Jetbrains IDE but my experience with it so far has been great. Latency is not a concern because the VM is local.
Most importantly, I wanted to avoid a syncing situation where you have the source and the destination with two possibly differing states. This difference implies the possibility of conflicts which complicates things even further.
Accessing WSL files via 9P works pretty nice, except for when you use JetBrains. Half of its tools interacting with Docker & WSL do not understand how to work with the \\wsl$\ prefix. Some places you have to put \\wsl$$\, but on other places it gets converted to //wsl$/ which then breaks path mapping.
Have you tried running JetBrains IDEs from within WSL so it is "native". I tried this a couple of years ago and WSLg had some issues which may be fixed by now, e.g. the find and replace popup would not appear.
Edit: there is also JetBrains Gateway as a solution now but I find it less of a seamless experience than vscode remote, e.g. it requires using the large r 4 CPU / 32GB VM option via GitHub Codespaces and it doesn't sync plugins.
Path mapping on PhpStorm was very tricky for me as well (I’m running Docker containers inside my VMs so I had to set up correct mapping in PhpStorm if I wanted to run tests from the IDE) and I’m only talking about regular file paths here (no \\wsl$\ prefixes)…
Another reason to do this is because it's very easy to back up the entire VM using backup methods that are going to be considerably faster because they're backing up one big file rather than thousands/millions of individual files. And once backed up you have a single snapshot with integrity.
Repeatedly learned lesson: keep FS metadata as close to where you need the data as possible. Sharing or remoting metadata kept a last resort for when it's actually needed by the use case, as doing so always costs orders of magnitude in performance at scale.
If I understand you correctly, then yes, I have eventually decided that I should only compromise on performance where it hurts less, e.g. for IDE access. If you’re running your application on a web server inside the VM for example, then this is the place where you probably need the best performance possible since this is where you’re going to feel it the most.
This seems like a slightly amateur/hobbyist recommendation, but every approach is valid under certain circumstances.
If it's a database or otherwise expensive to recreate file (like a certificate that's subject to renewal rate limits) keep it somewhere you can get to it if the VM needs recreated. Typically this is a mounted volume on a SAN in larger self-hosted environments.
My use case is rather specific (web development) and my recommendation is of course not intended to cover all possible use cases. However, if you’re working on web projects and need VMs to isolate your development toolset then this is the approach I choose after years of experimentation.
Desktop eDLP agents run on the hypervisor and MITM the connection between the endpoint and the destination (and/or monitor the filesystem). But if you don't have the agent on your VM...all the agent sees is an encrypted session passed through the shared adapter.
Correct me if I’m wrong, but isn’t it a bit pointless to worry about data exfiltration when we’re already talking about mounting (and, by extension, sharing) directories?
Not necessarily? I assumed the author was doing this in a work context and passing data between a physical corporate asset and a VM he created within it. It hasn't been exfiltrated until it leaves the corporate network.
I’m using this setup for my own projects but I suppose this setup could work in a work context as well if we’re talking about a work laptop with all of the project files already on it. In that case, I don’t see how spinning up a local VM and serving those files to the host would allow for data exfiltration, unless I’m misunderstanding you.
I am the author. :) I’ve clarified my use case at the top of the article. I’m not connecting to remote VMs or anything of the sort, all of this is happening in a local VM so the data exfiltration point does not seem to apply (unless I misunderstood your point).
More specifically, I’m having trouble seeing the issue with this approach if you’re, say, working on a laptop that has your project files and decide to spin up a local VM, place the files inside it and share them with the host via NFS/Samba/whatever.
I spent a lot of time working on this exact problem back about 10 years ago. At the time I came to the conclusion that files hosted in the VM with sshfs on the host (macos or linux host) was definitely the best available option.
sshfs is surprisingly good if you get the parameters set just right.
So, basically:
performance accessing files on the host from the VM over NFS is not great, and this is bad.
performance accessing files in the VM from the host over NFS is not great, and this is ok .
Kind of self-contradictory
It isn't if you’re more interested in better performance during runtime. I can tolerate a certain delay when my IDE writes to a file on the host but I need the best performance when my app is running.
Edit: the point here is about shifting the area of compromise. I would very much prefer to compromise on IO performance when writing/reading files with my IDE rather than sacrifice the web server performance on the guest.
I've developed almost exclusively in VMs for over a decade. One reason I use VMs is to isolate the execution context from development and deployment. I used to passthrough from the host but poor performance and the lack of inotify are DX barriers. Passing through the FS to the host is a no-go because of subtle executable things like git-hooks that could enable sandbox escape.
The simplest and best approach I've found is to use a git remote on the host to push branches to/from the guest sandbox. I can commit on the sandbox fs and treat the host as an upstream remote. On the host I pull from the sandbox and push up to GitHub/etc. It's a bit more process but becomes second nature quickly and requires no extra tools. This also works well for remote servers.
Another approach I've used is lsyncd to sync files from the host to the guest (Mutagen is another cool syncing tool). In practice, though, I've found syncing to be a footgun too. It's too easy to edit a file on the host and blow away a change inside the guest with no undo. This is one reason I've found explicit git push / pull to be cleaner.