Hacker News new | past | comments | ask | show | jobs | submit login

Figuring out how to swap NFS mounts behind a massive fleet of KVM instances

Long story short, you can't just unmount /somepath, mount it again, and pretend all is well

I learned a lot about mounts, particularly how the kernel tracks these with IDs

Definitely not novel, but the application/consequences of it made me a hero that week




Could you explain more? Sounds like a fun problem.


I'll do my best! Dodging proprietary bits (and bad term usage) along the way :D

First, some background: we had to pivot from one NFS instance to another. Assume the data was already consistent.

The goal being to minimize observable disruption to the guests. We can pause time, but can't restart the instances -- the services involved should be unaware.

Processes hold onto files they have open. This is pretty well understood - many have heard of file descriptors

These are very sticky -- particularly for things with mounted filesystems. This is where my path to glory appeared

The thinking was... as long as that path was there when the VM process was resumed, we'd be fine...

In reality, we weren't! The kernel isn't really concerned with the fully qualified path.

From the example above, /somepath is really just like "mount ID 2" to the kernel.

In the end we had to renew those file descriptors, consequently picking up the new mount IDs

We ended up pausing the instances, saving the memory state locally, swapping the mounts, and then resuming the VMs.

Time only briefly skipped, and we successfully moved thousands of instances from one NFS 'host' to another




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: