Gregg started with a demonstration tool that he had just written: it's immediate manifestation was in the creation of a high-pitched tone that varied in frequency as he walked around the lectern. It was, it turns out, a BPF-based tool that extracts the signal strength of the laptop's WiFi connection from the kernel and creates a noise in response. As he interfered with that signal with his body, the strength (and thus the pitch of the tone) varied. By tethering the laptop to his phone, he used the tool to measure how close he was to the laptop. It may not be the most practical tool, but it did demonstrate how BPF can be used to do unexpected things.
Is it still widely used and I just happen never to see it because the environments in which I work?
Or is it only used for a small number of sites (or certain applications) but they happen to be extremely important ones?
Every Linux shop I've worked in 10+ years has used NFS without any word of an alternative. The few places I've worked that mixed NFS with SMB or exclusively used SMB I've had performance issues when jumping between the two (perhaps due to experience and configuration).
My work has mostly been in Linux shops with 10s-100s of workstations+servers. NFS was used for homedirs and shared data. Sometimes servers were hundreds of miles away.
It’s good to see these replies as everybody’s experience is different and I learn by asking.
Perhaps NFS has improved a lot; I started using it in the 1980s after using client/server filesystems like IFS and Alpine at parc and the per-file protocols we had at mit for the pod-10s and Lispms. With NFS locking and such were painful because it tried hard to look no different from a local filesystem, but couldn’t be. Network mounted homedirs were common to me from the early 80s but under NFS were too painful.
I imagine things have improved over the last couple of decades!
The environments I’ve used more recently have used a combination of replication (e.g. Dropbox), moved the computation to the data (“cloud” though for me typically this has meant in-house someplace rather than a third party or shared machine) or hybrid (e.g. IMAP).
Datasets tend not to be super huge — < 50 TB — so other approaches are used at the back end.
With CIFS/SMB throughput wasn't the issue, but dealing with small files seemed to be. With NFS most places served software packages off of it and whenever this was tried with SMB it was unreasonably slow. I'm ignorant enough at the implementation that I could believe this was a configuration thing.
I don't use automount, but unless automount does some crazy magic ... this doesn't really help; umount -l (or umount -lf) indeed removes the mountpoint from the filesystem (so no new processes can access it and get stuck), but the kernel thread is still stuck waiting for an answer that will never come, and many processes cannot be killed even with "kill -9".
Usually, whatever old process that is hanging onto the mount gives up or dies. If not, often a "kill -9" takes care of it. I have had these processes get stuck indefinitely. I was usually doing something stupid. For example, mounting a USB drive over NFS for a user. They got trigger-happy and pulled the USB drive without unmounting it or informing anyone. Effectively, I just considered that mount point "burned" on the client until I could reboot.
I'm not sure how other people use network mounts. In general, they're expected to be up and any changes or removals will be part of a maintenance window. Changes would go through a series of steps to avoid this scenario. Sure, it's a bit of a pain, but doesn't seem to unreasonable in practice (stop new mounts, kill existing processes/mounts, update). My experience has mostly been with NFSv3, for all I know newer versions address this better.
Gitlab.com was also using it as of last year, but I’m not sure if that’s still the case.
when the NFS server crashes bad things happen and if you are building a redundant multi master NFS it’s a lot simpler to use iscsi and FC SAN
My favourite moment operating one cluster was when I intentionally caused a kernel panic and core dump on a live production system to gather debug info for an issue we were having. It had zero observable impact. :) Netapp is not cheap, but they seem to know what they're doing.
I believe my alma mater's CS department is still using NFS for home directories last I checked.
There are some issues with nfs(v4) on the bleeding edge by default, but it has been really cool to see some of the nfs workarounds to deal with big data.
Given the volume of btrfs negative experiences that are going to need to be overcome (for example, many of the posts in ), maybe people have just given up? If ZoL licensing wasn't a problem, would anyone even be interested?
But it's also been the default filesystem for SUSE Linux and Synology NAS products for a long time now, and they don't seem to be having any problems.
I don't know what to believe.
I filled up the root filesystem by accident once, which is supposed to be a very bad situation for btrfs. I was able to ssh in, diagnose stuff, delete the offending files, and then I did a minor rebalance to clean up some of the unnecessary metadata space that got allocated (not essential but I wanted to try it). No big deal and still works fine.
I haven't used any btrfs raid features. I mirror it (via daily rsync) to an ext4 filesystem on another device to guard against filesystem-related bugs and device failure.
ZFS, as much as I like it, will unfortunately never be part of the kernel and users will constantly be put into this kind of situations: https://www.phoronix.com/scan.php?page=news_item&px=ZFS-On-L...
They don't use the RAID support from BTRFS (they do it themselves in software a level up using mdadm), only the snapshot, COW and general storage.
There very occasionally (based on forum posts) seem to be issues with metadata fragmentation for some people when there's a lot of metadata, but otherwise seems very reliable (I've got two myself).
It also seems to be pretty solid in enterprise-friendly RAID10 software mode. Anything outside of that should be treated with extreme caution and avoided for anything you can't afford to lose. For RAID5/6 in particular, as far as I know, little has changed since the discovery of a major corruption-creating bug (which supposedly was going to require a complete rethink of major portions of the Btrfs architecture). It's an embarrassing situation, but no one seems to want to work on it because it would require a lot of work to fix, and anyone who's paying people to work on Btrfs isn't going to run RAID6 in production anyway.
It's unfortunate, but the real best advice to anyone who wants software RAID and doesn't know exactly what they're getting into is "just use ZFS".
I personally have been using it for maybe 7 years and it works just fine, and snapshots are awesome. To me, btrfs is being more stable than Ext3/4 back when I used them.
Personally, BTRFS feels like it has a ways to go before it's ready for prime-time.
I've had two major and one minor BTRFS-related issues that have scared me away from it.
1) One of my computers got its BTRFS filesystem into a state where it would hang when trying to mount read/write. What I suspect is that there was some filesystem thing happening in the background when I rebooted the machine. I rebooted via the GUI and there was no sign that something was happening in the background, so this was really a normal thing that a user would do. No amount of fixing was able to get it back, but I was able to boot from the installation media, mount it read-only, and copy the data elsewhere.
2) Virtually all of the Linux servers at work will randomly hang for between minutes and hours. This was eventually traced to a BTRFS-scrub process that the OS vendor schedules to run weekly. The length and impact of the hang seemed to be based on how much write activity happens - servers where all the heavy activity happens on NFS mounts saw no impact, but servers that write a lot of logs to the local filesystem would get severely crippled on a weekly basis. We've moved a bunch of our write-heavy filesystems to non-BTRFS options as a result of this.
3) This is a more minor issue, but still speaks to my experience. I had a VM that was basically a remote desktop for me to use. Generally speaking it would hang hard after a few days of uptime with no actual usage. When I reinstalled it on a non-BTRFS (sorry, can't remember which filesystem I used) filesystem it was rock solid. I have no proof that this had anything to do with BTRFS.
All of these were things that happened around a year ago, they may not be a true representation of the current state of BTRFS. But they've burned me badly, so now any use of BTRFS will be evaluated very carefully.
In contrast, I've been running ZFS on a couple of FreeBSD servers, with fairly write-heavy loads, and have had no issues that were filesystem-related. Even under severe space and memory constraints ZFS has been rock solid.
The first problem is directly attributable to BTRFS. There is no way a filesystem should get corrupted by a simple user-initiated reboot, regardless of what the system is doing in the background.
The second problem is a combination of BTRFS and the distribution. The distribution added a weekly job which did a BTRFS scrub (IIRC), under certain workloads that would completely hang machines for minutes to hours. The time this ran seems to be based on when the OS was installed, so as luck would have it these brought production systems down during business hours.
The third problem is something I have no idea about. It could be BTRFS, it could be something completely different, I honestly have no idea.
>The first problem is directly attributable to BTRFS. There is no way a filesystem should get corrupted by a simple user-initiated reboot, regardless of what the system is doing in the background.
It doesn't seem like you have any direct evidence that this has something to do with rebooting. In fact, Btrfs is much less susceptible to these sorts of problems than previous file systems. That's because writes in Btrfs are atomic, so a file is never "partially" written to disk. Either it's written, or it isn't. You can't get disk corruption from write failures.
What I suppose might potentially happen is that you rebooted the system while in the middle of installing a bootloader or kernel update. I don't know if you tried mounting the partition r/w from another system or not, but assuming you didn't, it's probably more likely that something broke on your system that prevented it from remounting the partition during the boot process.
To recover, I booted from an installer image on a USB drive and tried to mount the partition R/W and it hung. Since the installer is a known-good environment, this rules out breaking my system the way you describe. This was a corrupt BTRFS filesystem.
One possibility is that because I was using a rolling-release distribution, I may have gotten a version of BTRFS with a bug in it or minor changes in the BTRFS code as the system got updated eventually got the filesystem into a state that rendered it unable to mount R/W. Either scenario doesn't inspire confidence.
It's been awhile since then, but that was our experience. I don't have access to the records or email from the time. Maybe it's been fixed.
In many ways ext4 is "good enough" but there are sweed juicy advantages of copy on write, checksummed file systems. I guess all of the enthusiasts are using ZFS which is a work of art but it has licensing problems AND IP is owned by a law firm with some engineers attached that was kickstarted by the CIA.
I use ZFS and know a bit of the history, but it took me a while to realize that meant "Oracle". I didn't realize CIA was their first customer (something I found while googling just now) but I suppose that's not too surprising. Although to be fair I am not sure having CIA as a client in 1977 means much for ZFS.
Also a thin LV with XFS will give you many if not most of the benefits of BTRFs/ZFS but with stability and being available on any reasonably old Linux distro.
ZFS on linux is not a re-implementation of ZFS
Initially I thought this was about recent Spectre vulnerability variants related to BPF. Then I found it is actually discussed in BPF: what's good, what's coming, and what's needed