We spent two weeks hunting an NFS bug in the Linux kernel (2018)

akersten · on Oct 12, 2020

This is a really detailed writeup and a huge amount of effort clearly went into the post.

The NFS maintainer seems to agree that this was a bug worth fixing. But I think I'm missing something - doesn't the specification say that the observed behavior (the file handle having gone stale) is acceptable, based on this snippet:

> A filehandle may or may not become stale or expire on a rename. However, server implementors are strongly encouraged to attempt to keep file handles from becoming stale or expiring in this fashion.

So, is it more of a 'the specification isn't sufficient for real world usage' kind of fix, or was something actually broken?

misnome · on Oct 12, 2020

I understood that it’s an acceptable Server response, as per spec, but it’s a bug on the client that it asks for the now-invalid file handle even though it’s had it’s delegation withdrawn - from the client’s perspective it has no reason to assume that the file even exists any more.

And yes, this is an excellent writeup.

Someone · on Oct 12, 2020

“the specification isn't sufficient for real world usage”

I think that sums up NFS reasonably well. “Let’s make the API connectionless and stateless; that simplifies things” is true, but implementing Unix semantics, where files can be opened or closed, file deletes change state, file deletes aren’t seen by open file handles, programs want to do record locking, etc. on top of it? Maybe not the best of ideas.

There’s a reason S3 isn’t a file system.

(See also The UNIX Haters Handbook chapter on the Nightmare File System)

guenthert · on Oct 12, 2020

NFS isn't stateless since NFSv3 was introduced more than 25 years ago.

There was a remote fs before NFS which closely mimicked Unix file system semantic (AT&T's RFS). There's a reason you don't read much about it.

tinus_hn · on Oct 12, 2020

Originally you could run with NFS as your only storage, even swapping on NFS. If the server rebooted the clients would get stuck but resume once it came back up. This requires some hard to deliver guarantees from the server.

Someone · on Oct 12, 2020

From what I understand, that only worked if the processes making NFS requests kept retrying long enough.

But yes, being stateless has the advantage that the server won’t lose connections, open file state, etc. across client or server restarts.

That’s useful, but if you assume you have a file system (which NFS claims to be), and want to use those features, you’re asking for problems. NFS is aan example of “worse is better”.

tinus_hn · on Oct 12, 2020

The idea was they’d keep trying forever which typically is long enough. The downside is if you retire a server all the clients that still have files open will hang.

grahameb · on Oct 12, 2020

I used to work with a fairly large HPC environment which had a large number of NFS attached disks. One of the issues we ran into was unlink() on ext4; it could take quite a long time for large (TB..) files on busy servers. NFS would eventually timeout the request, the client would retry the operation, and get ENOENT back. It turned out that while ext4 blocked until it'd finished every detail of unlink(), the name was immediately removed from the filesystem namespace.

NFS is in the category of works well enough, but the edge cases are a mile wide!

formerly_proven · on Oct 12, 2020

NFS is a gruesome mess when files aren't exclusively used by a single client (namespace and time(!). A lot of "classic unix" ideas are very, very bad ideas on NFS and will either not work at all (good) or silently sometimes randomly discard data (juck).

yencabulator · on Oct 16, 2020

NFSv4(.1) was largely created to fix this. It has delegations, a mechanism for clients to have coherent caches and exclusivity when needed. As far as I know it works just fine, the world just mostly moved away from wanting NFS.

m463 · on Oct 12, 2020

Wow, that brings back memories. Like "why not mount with intr?" Because NO. (wait, or was it soft mounts?)

guenthert · on Oct 12, 2020

Soft mounts may cause silent data corruption (it's actually the applications which don't check the return code on file system operations which cause the corruption, but there are just too many of those). Just don't do it.

People sometimes (ignorantly and irresponsibly) recommend (or have recommended -- NFS is squarely in the legacy corner of technology) soft mounts in an attempt to deal with unreliable servers, i.e. prevent clients from 'hanging' if the server doesn't respond. The real solution is of course to fix the darn server / the network.

dang · on Oct 12, 2020

Discussed at the time: https://news.ycombinator.com/item?id=18556775

ridiculous_fish · on Oct 12, 2020

How do others test their software on slow NFS?

I hacked up a deliberately slow NFS server [1], but it's awkward to deploy, and I don't think it's representative of real-world, so it's not part of any automated suite.

1: https://github.com/ridiculousfish/SleepyFS

neurostimulant · on Oct 12, 2020

Packaging it as a docker image is a common way to make deployment easier. Your nfs server seem to be a good fit for a docker image as everything runs in user space.

vfclists · on Oct 12, 2020

Didn't the programmers' grandfathers warn them that "GOTO is considered harmful"?

To paraphrase Macbeth -

"Is this a GOTO I see before me?

Come, let me hold you.

I don’t have you but I can still see you. Fateful apparition, isn’t it possible to touch you as well as see you?

Or are you nothing more than a GOTO created by the mind, a hallucination from my fevered brain?

I can still see you, and you look as real as this other GOTO that I’m pulling out now ..."

Dylan16807 · on Oct 12, 2020

Why does it matter? The GOTO wasn't the problem, and GOTO is a reasonable solution to doing cleanup in C.

The problem is the code forgot to check for a particular situation: https://marc.info/?l=linux-nfs&m=153807208928650&w=2

coolspot · on Oct 12, 2020

In my case it was my strict grandmother. Everytime she found GOTO in my code, she spanked my hands with a stick.

powerbook5300CS · on Oct 12, 2020

Why is anyone using NFS?

nexuist · on Oct 12, 2020

Does it matter? When I walk into a building I don't get to tell their IT guys what protocols to use and which to avoid. I don't get to tell them my own preferences and make them conform to them. I don't get to start a job and demand they change everything about everything on day 1.

I adapt. And if they're using NFS, I want to make sure I can use NFS too. That's why fixing this bug was important.

st_goliath · on Oct 12, 2020

> Why is anyone using NFS?

I use it a lot because the Linux kernel can mount it as rootfs during boot with little more than a few boot commend line flags. This IMO comes in very handy when working on embedded systems (at least when they have Ethernet and a sane boot loader).

My development setup looks pretty much like this: My workstation is connected to the board and has DHCP, DNS, TFTP & NFS servers running on that port. Outbound traffic from the board is routed across the workstation and possibly passed through tcpdump or Wireshark.

The boot loader on the board would get an IP from my box, then load the kernel & device tree from the TFTP server. The kernel would then do the DHCP dance again and mount the NFS rootfs from my box.

I can simply recompile the kernel, power cycle the board and have it boot into the new one with all the new/changed modules available in the rootfs; or recompile some user space software I'm working on, install it into the rootfs and have it available on the board immediately.

The only other options I can think of would be 9pfs and SMB. Support for SMB as rootfs was only quite recently and 9pfs would require some initrd trickery.

Spivak · on Oct 12, 2020

Because there aren’t too many options for the use case of a shared filesystem that has many readers and writers, support for directory authentication and untrusted/less trusted clients.

You pretty much have NFSv4 and SMB.

0xbadcafebee · on Oct 12, 2020

You were unfairly downvoted. This is an important question.

Sometimes NFS is your only option. When you have legacy applications which nobody is going to rewrite, and you need to scale some operations on a set of files over many nodes, NFS works. Its failure nodes are well documented, so you can plan for and work around them.

It's also supported by most vendors as the default networked filesystem. Whether you're using a desktop PC, a million dollar NAS, or cloud storage, it supports NFS.

It's also acceptable when time is of the essence and you just need to get something working now and don't have time to build something better. I rather like "good enough solutions" as a way to get teams "shipping", with an agreed-upon plan to replace it later.

But as someone who's used it for nearly 20 years, internally I start screaming and throwing things whenever someone suggests using it. Externally I propose redesigning for a more efficient solution :)

jeroenhd · on Oct 12, 2020

The most common alternative, SMB, has horrible performance when it comes to small file operations compared to NFS. When files get larger the performance starts improving, but when you're doing many operations on small files there's a two to three times speed improvement if you switch to NFS.

There's also the fact that macOS, Linux and Windows all have official support for NFS while SMB is achieved through analysing and reverse engineering Windows' protocols. There's no stability guarantee from Microsoft that SAMBA will keep working with the next update of Windows 10 while the NFS standard does maintain a minimum level of compatibility.

I can't think of another way to share files over the network outside of NFS and SMB with cross-OS support. Perhaps you know a file sharing mechanism that has similar small file performance with similar OS support?

zokier · on Oct 12, 2020

> while SMB is achieved through analysing and reverse engineering Windows' protocols

SMB2 protocol is documented here explicitly for interoperability purpose: https://docs.microsoft.com/en-us/openspecs/windows_protocols...

> There's no stability guarantee from Microsoft that SAMBA will keep working with the next update of Windows 10 while the NFS standard does maintain a minimum level of compatibility.

I don't think there is any difference between SMB and NFS in this regard. There is no guarantee that next version of Windows will work with any NFS implementation either

tannhaeuser · on Oct 12, 2020

Because it's one of the few options for a sufficiently standardized file serving protocol that can be implemented on small and big sites (with dedicated servers) and doesn't require proprietary APIs and/or auth? In other words, a professional choice that keeps open engineering options going forward.

fishywang · on Oct 12, 2020

I can't remember where but I once saw an open source project provide an NFS server package. Why? Because in the cases you want to use FUSE (for example, to implement a locally mounted Google Drive/Dropbox/etc.), you could also just spin up an NFS server instead. That actually makes a lot of sense to me, because that NFS server is much easier to for example containerize (you can't actually containerize a FUSE implementation and use it on your host. Or maybe you can but at least that would be a very involved process). I'm not sure of NFS is the best protocol for this use case (maybe something like CIFS is better), but the general direction is actually interesting.

hknapp · on Oct 12, 2020

What would you recommend?

lmm · on Oct 12, 2020

If you absolutely need to share one server's filesystem, Samba tends to have more reasonable failure modes. If you absolutely need a truly distributed filesystem then AFS is your only real option. But really a filesystem is almost certainly not the right interface; you're almost certainly better off using something higher-level - maybe HDFS if you need to store file-like data, maybe a key-value store or a distributed queuing system if you're doing something more structured.

geofft · on Oct 12, 2020

AFS (at least OpenAFS) is not "truly distributed": it has a single point of failure for read/write volumes. Yes, you can make read-only replicas easily and many large AFS users make good use of that, but it doesn't fundamentally solve the problem or fundamentally do something NFS can't.

Also you can avoid a SPOF in NFS with Isilon's commercial offering (which worked great in my experience, at least back when they were using an implementation based on FreeBSD's kernel NFS server), or potential Red Hat's HA setup.

Also, CephFS is an option too and avoids a SPOF by design. I've only run Ceph block storage but it's absolutely a real distributed system and works well.

Polylactic_acid · on Oct 12, 2020

Remote file systems are the worlds biggest pain in the ass. NFS seems like the easiest option but its ancient and insecure over the network. SFTP can be mounted and is also easy while being secure but is super CPU heavy on a single thread.

All of the options I looked in to would fail when a program expects to be able to set permissions on things in the remote directory.

cat199 · on Oct 12, 2020

> NFS seems like the easiest option but its ancient and insecure over the network.

r.e. old: nfsv4 is a different beast than nfs3;

r.e. secure: these things don't hold true with krb5 auth.

Polylactic_acid · on Oct 12, 2020

Wikipedia says v4 came out in 2000 and a search for encryption on the page shows no results.

sftp gives you really good key based authentication and encryption over the network. I wouldn't trust NFS for anything other than a highly secure internal network.

yrro · on Oct 12, 2020

Mount with sec=krb5p and you get encryption (the p is short for Privacy).

starfallg · on Oct 12, 2020

>a search for encryption on the page shows no results

https://wiki.debian.org/NFS/Kerberos

krb5p is pretty secure. You just need a Kerberos implementation. The alternative is to run NFS over Stunnel, which is what Amazon does for EFS.

pjc50 · on Oct 12, 2020

It's still not link layer encrypted, IIRC?

qiqitori · on Oct 12, 2020

I would recommend redesigning the system to no longer require a filesystem shared over the network. (It appears to mostly work okay on a virtual network on the same host though.)

st_goliath · on Oct 12, 2020

> I would recommend redesigning the system to no longer require a filesystem shared over the network

One of the first things that come to my mind when I hear "networked filesystem" is the typical setup at most companies I worked at in the past (those with more than a dozen or so people; plus of course schools and university I attended), where you have log on authentication via LDAP/AD/... and the system would mount network shares with your user files, files from the teams/projects you were on, etc.

I would have a desktop PC in the office, but could also just walk over to the lab, login with the same credentials at a PC there and have all my files. Meeting rooms would also have permanently installed PCs and you wouldn't have to fidget with the projector connections/settings, you just log on and are good to go.

On top of that, none of those systems would be my personal property and I wouldn't take them home with me, or bother installing/updating software. Other people would be paid full time to make sure everything works and I have access to all the files and software I need (and only those I need). Sounds crazy, right?

I'm eager to hear your recommended redesign.

qiqitori · on Oct 12, 2020

That's an okay use case. I was more thinking of what appears to be described in the article, i.e. running a large gitlab instance with the data directory in an NFS mount.

dnsmichi · on Oct 12, 2020

Hi, Developer Evangelist at GitLab here.

Scalability is a great point. With 13.0 we have added Gitaly clusters removing the dependency on NFS, whilst also improving high availability. Gitaly is the backend daemon for accessing Git repositories where GitLab communicates with.

https://about.gitlab.com/releases/2020/05/22/gitlab-13-0-rel...

NFS support in Gitaly has been deprecated and will be removed in 14.0 next year. https://about.gitlab.com/releases/2020/05/22/gitlab-13-0-rel...

There are environments and use cases for NFS which are being discussed in this epic: https://gitlab.com/groups/gitlab-org/-/epics/1489

kjs3 · on Oct 12, 2020

Thanks for this. Not only did I get a much needed laugh out loud, I'll be using it at a staff meeting tomorrow to get a laugh from the whole team. Sorry it's at your expense.

m463 · on Oct 12, 2020

I used to have a desktop machine with my source tree (and editor and tools) on it. I had to compile the code on a machine with a different os that I would frequently wipe and rev.

So I just nfs mounted my source tree and compiled on the remote machine.

I also tried cifs. compiles took like 3x the time.

sildur · on Oct 12, 2020

Why is anyone using linux? Or mac?

dboreham · on Oct 12, 2020

Really. I was about to comment "don't use NFS".

geofft · on Oct 12, 2020

Can you expand on that / why not? It's certainly possible to use NFS in places where other tools would work better, but if the thing you actually want is filesystem semantics from multiple clients, NFS is definitely one of your better options, as far as I know / in my experience.

tryauuum · on Oct 12, 2020

The authorization scheme sucks.

geofft · on Oct 12, 2020

Can you expand on that? Which authorization scheme are you thinking of? I've run global-scale NFS deployments with sec=sys before and I wouldn't say it sucked.

tryauuum · on Oct 15, 2020

so say if you have an NFS share (server A) on a public IP-address, and NFS client (server B), that mounts an NFS share with that sec=sys option. Doesn't that mean that a rogue server C can connect to server A and impersonate any user?

geofft · on Oct 18, 2020

If you a) run NFS on a public IP address, b) use sec=sys, and c) permit connections from any IP, then yes.

I think that can't be generalized to "The authorization scheme sucks" any more than if you run sshd on a public IP address and permit root logins with a password and pick a guessable password you can conclude that SSH's authorization scheme sucks. There's a number of stronger things you could do and they're perfectly normal.

tryauuum · on Oct 19, 2020

My main grudge is that NFS authorization is pointless without external crutches like IP-address whitelisting.

Maybe it's because I'm used to password-based or key-based authentication methods and when I had met NFS I became scared of that host-based authentication.

Of course I agree that IP whitelisting is a good practice that solves and prevents many issues.

ThenAsNow · on Oct 12, 2020

Serious question: what alternative would you recommend for access to NAS volumes from unix-like OSes (Linux, *BSD, MacOS) over LAN? Other than doing discrete file transfers over a channel like SSH, I'm at a loss for alternatives here to NFS and SMB. Given the two, if Windows is not involved on the LAN, I don't see why SMB makes any more sense than NFS. What other options am I missing?

ncmncm · on Oct 12, 2020

I found it very distracting to be told that things "beg the question" that really did not.

Only a person can beg the question, and then only if they are either very confused or are trying to deceive you.

amscanne · on Oct 12, 2020

The “misuse” of the term is so common that it is now included without comment in many dictionaries and reference sources (e.g. Wikipedia). The formal fallacy is now the more uncommon use, so it’s probably worth getting over it ;)

iso8859-1 · on Oct 12, 2020

Strictly speaking, petitio principii is not a fallacy of reasoning but an ineptitude in argumentation: thus the argument from p as a premise to p as conclusion is not deductively invalid but lacks any power of conviction, since no one who questioned the conclusion could concede the premise.

(this was a quote from the Britannica)