Hacker News new | comments | show | ask | jobs | submit login
Proposal to move CoreOS off of btrfs (groups.google.com)
63 points by jsnell 593 days ago | hide | past | web | 48 comments | favorite



Thank goodness. btrfs has bitten me way too often on CoreOS. I eventually succumbed to wiping the /var/lib/docker mounted directory on every reboot to minimize the occurrence of the low disk space issues, but nevertheless, issues still exist. Even today, I still have some cases where docker will complain about not being able to create the container/file, due to some inane error. I've had to reboot machines many times just to try to get it to cooperate. Furthermore, that solution of wiping the partition means that my machines need to redownload their docker images on every reboot, which means they come up more slowly than absolutely necessary.


This is ridiculous. CoreOS is managing their own distro, so why not just package in ZFS and get it over with.

It's been how many years now waiting for the next great ZFS competitor? If nobody is able to improve on ZFS, how about we just all jump on the bandwagon and move on with our lives?

If nothing else, having more people using ZFS may inspire someone to actually improve on it. BTRFS has such a limited feature set in comparison to ZFS that with the advent of ZFS on Linux, it's really hard to understand the community's continued backing of BTRFS as the Linux community's CoW filesystem competitor to ZFS. Spend a few weeks using ZFS on Linux on a machine with a few SSDs and a few HDDs, or just enable it on a linux laptop with a SSD drive, and it's hard to go back to anything else.


Because sun decided to sabotage it when they released it and oracle haven't stepped back yet.


Sabotage? How, exactly? I have heard of no major corruption or performance bugs in ZFS, what am I missing?


From http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIssue

"ZFS is licensed under the Common Development and Distribution License (CDDL), and the Linux kernel is licensed under the GNU General Public License Version 2 (GPLv2). While both are free open source licenses they are restrictive licenses. The combination of them causes problems because it prevents using pieces of code exclusively available under one license with pieces of code exclusively available under the other in the same binary. "


as others have said below. the CDDL or 'cuddle' license as some chose to pronounce it. Is incompatible with the GPL, which means two things.

1 - We can't just take the code and add it to the kernel. 2 - The patent and other legal grants in the licence won't apply if we try to reverse engineer a compatible GPL2 equivalent for the kernel to use, which will be a HUGE risk for the few companies that would probably consider the cost of reverse engineering worth it. And before you say it can be done without company support. I have 2 things to say, hows that going for the ReiserFS 4 fans who wanted to keep improving that, and in order to continue the work on ZFS effectively, the existing ZFS developers have 'ganged up' and now try to work on everything so that the openzfs project can be a single source of truth for the ZFS source code, and a reverse engineered version would be very unlikely to benefit from this, and would therefore require even more developer time just keeping up with the improvements 'upstream' in ZFS from the 'original' ZFS codebase.


  In the case of the kernel, this prevents us from distributing ZFS as part of 
  the kernel binary. However, there is nothing in either license that prevents 
  distributing it in the form of a binary module or in the form of source code.
Source: http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIssue

  ZFS cannot be added to Linux directly because the CDDL is incompatible with the GPL. 
  ZFS can, however, be distributed as a DKMS package separate from the main kernel package.
Source: https://wiki.ubuntu.com/ZFS

So why not just distribute the DKMS package with a distro? Easy enough. Will it ruin the user experience somehow?


licensing


The license for ZFS is still CDDL


Yep. It's fun when your entire etcd cluster dies at the same time (because you started a global unit that pulls a huge docker container).

You'd think they have safeguards to automatically rebalance on certain conditions (metadata filling up, etc) but no, you have to babysit btrfs.


I'm glad I'm not the only one with these problems. I was starting to think I was going crazy with my etcd+btrfs issues.


hrm... what's the failure scenario here? Are there any good btrfs management utils that can help with running btrfs in prod?


> Even today, I still have some cases where docker will complain about not being able to create the container/file, due to some inane error. I've had to reboot machines many times just to try to get it to cooperate.

I get those, and am not using btrfs...


yes this, I get those too and no btrfs


Is it this one? https://github.com/docker/docker/issues/5684

Or this? https://github.com/docker/docker/issues/4036


I've seen lots devicemapper failures running docker build on ext4 (AWS SSD EBS). Really painful to have to rebuild on failures since these issue seem to happen quite often.


I've been bitten badly by btrfs before as well.

I wish it were feasible to get CoreOS (and Docker, for that matter) to add support for ZFS. I use LXC + ZFS every day and it is stable, reasonably fast, and all around wonderful!!!

sigh.


So... I just installed Ubuntu 14.10 on my (new) home PC, and chose BTRFS because I figured it was stable by now. Should I be worried? I do plan on playing with docker.


I've been using btrfs on several machines for about 3 years. It is not stable: https://btrfs.wiki.kernel.org/index.php/Main_Page#Stability_...

When it goes wrong (about every 4 months for me), you will end up in a nightmare. Generally attempts to fix issues will make things worse, various tools and pages contradict each other, and the devs are only interested in the latest kernel version. Much of this is deliberate - the code is written to not sweep things under the rug, which means you can hit problems and not recover. Use backups and make sure you can restore.

The reason why I keep using it is because there is no silent corruption as you get with ext4. A scrub can verify every byte of data is unaltered and recover if using anything other than single profile. Compression, volume management, cheap snapshots etc also make working with it nice. Until things go wrong.

The single biggest "going wrong" is running out of space. Copy on write filesystems by their nature leave existing content alone and write new information in the spare space, eventually doing a garbage collect of obsolete data. When you are out of space that gets rather difficult with bizarre symptoms and tricky recovery.


Thanks! Luckily, I hadn't done much more than just installing Ubuntu, so now I'm reinstalling it, this time with EXT4 :)


Just be aware that other filesystems like ext4 do not checksum their data. They have no way of telling if corruption has happened, nor are the diagnostics useful if you do somehow figure out that a block is problematic. This has happened to me several times over the decades as hard drives have lost the plot, or due to bugs. Backing up a corrupted file gets you nonsense in the backup. (A standard btrfs demo is deliberately corrupting the filesystem and then showing recovery. You can't do that with ext4 since it has no idea if what is there is correct in the first place.)

You can use tools like LVM and md as a layer underneath ext4 to provide some resiliency, but there is a learning curve and two sets of tools to work with. Changing around disk/partition sizes isn't much fun with them.


I've not used it in about a year, but when I did it was slow as molasses. I didn't find any corruption issues, but the backup data I was storing there was at least an order of magnitude slower to access on btrfs than on xfs, which I eventually switched to. Yes, really, an order of magnitude.

It's too bad, I really love the idea of btrfs, but there's no way I would ever run a filesystem on it yet.


I've been using it on 2 home machines, for the home partitions, one using a HDD and one SSD. Been at it for over 2 years now, and I don't think I've ever had any filesystem related hiccups. Granted, they don't do anything special, just home stuff, but one is at around 75% capacity, so I guess you don't need that much extra disc space.


Just make sure you have plenty of disk space. Otherwise: http://ram.kossboss.com/btrfs-cant-mount-readwrite-full-spac...


For everyone complaining about the metadata rebalancing, btrfs 3.18 (CoreOS is currently using 3.17) added auto rebalancing:

https://btrfs.wiki.kernel.org/index.php/Balance_Filters

https://btrfs.wiki.kernel.org/index.php?title=Main_Page#News


First they came for your SELinux... and you did nothing. Then they came for your Docker... and you did nothing. Now they've come for your filesystem... Madness!

Just as BTRFS is finally stable and fast CoreOS decide yet again to ship.

We've been using BTRFS in production for over a year now (and heavily with Docker) and haven't suffered any problems at all(1), in fact we actively simulate failures to practice / test repair processes which have all been successful.

To me, while diversity is great CoreOS is going the path of Ubuntu with it's NIH syndrome.

(1) With the exclusion of a slow docker push/pull bug that's about to be patched: https://github.com/docker/docker/pull/9720


Can you clarify how this is an example of NIH? They're switching the default from btrfs to ext4, not writing their own filesystem.

Also, it's necessary if they want to switch to overlayfs as the Docker backend, because btrfs lacks whiteout support which causes problems with overlayfs.

Switching to overlayfs for Docker makes a lot of sense, given that you get the dedupe benefits of AUFS with lower overhead than devicemapper/btrfs and it's now part of the mainline kernel tree.


overlayfs is still in its infancy and you're then still relying on an underlying filesystem, also unless I'm mistaken OverlayFS doesn't provide compression and checksumming?


Am I right in saying that the complaints with btrfs in CoreOS are specifically around its use in conjunction with Docker?

(Interested as I'm thinking about building a homebrew NAS/general purpose server w/ btrfs, there's a lot of outdated info on btrfs but I was getting the impresssion that it's now a pretty stable and useable filesystem)


I can say that ZFS has worked great for me on BSD-based home servers. Haven't used ZFS with Linux yet, though it's possible to do so, it's just unpopular partly for licensing reasons. I suspect what type of RAID you do may have greater consequences than what file system you pick, particularly if your distro is already designed for serving files on the file system you choose. Oh and working out all the AFP/Samba bits are fun, because there's always something that surprises you.


When I last tried to try the issue of ZFS was one of performance and massive requirement of RAM to enable the features I wanted.

It seems performance have improved and are improving according to this benchmark of last year zfsonlinux by phoronix: http://www.phoronix.com/vr.php?view=19059


I've got severely burned by ZFS on linux running in AWS. Heavy NFS load (ZFS NFS, not linux kernel NFS) caused a kernel panic, pretty reproducibly. This was on Ubuntu 12.04 with the offical ZoL PPA sources, so YMMV.


For managing volumes, ZFS on Linux works great. But for managing NFS, I'd definitely go with a separate NFS implementation if I wanted to heavily use NFS. The primarily developers/users of the ZFS on Linux port are mainly using it for highly-available single-machine volume management, and exporting volumes to gluster or other clustering filesystems for use in massive HPC clusters like those over at Lawrence Livermore National Labs.

http://www.nersc.gov/assets/Uploads/W01-ZFS-and-Lustre-June-...

ZFS is meant for managing local drives, and to make it performant you need to configure SSD partitions to act as an L2 Arc cache. The online documentation is pretty good, so after going through the docs it should be pretty clear how to set ZFS up properly for your use cases.

But it sounds like you're using EBS drives? If so, not sure why you'd want to use ZFS. Last I checked, ext2 or xfs was the way to go with EBS drives on AWS. AWS has so much stuff going on in the background to ensure reliability/availability of EBS volumes that adding another layer isn't worth it IMO and I've seen similar kernel panics running other complicated volume managers on top of EBS.


> Heavy NFS load (ZFS NFS, not linux kernel NFS) caused a kernel panic, pretty reproducibly.

"ZFS NFS" is "linux kernel NFS". Setting "sharenfs" options on ZFS on Linux dataset simply informs the normal kernel NFS service of those exports.


I've had ZFS on linux freeze up on me every several months too. Not a production system fortunately.


One thing you might want to check is if you're setting a limit in the driver for the amount of memory ZFS uses for caching. By default it'll use a LOT of memory, so I usually just set to a max of 2GB and don't see any issues.


Checkout Rockstor https://news.ycombinator.com/item?id=8375236


If so many people are so regularly bitten by these BTRFS issues – they're easy-to-reproduce and painful – shouldn't they be relatively easy to prioritize and fix?


I'd like to see a real-world, reproducible BTRFS bug on 3.18 that's a serious problem.


If Btrfs is as bad as this proposal makes it look, why did the folks at openSUSE make it the default file system for the root partition on openSUSE 13.2?


Yeah I haven't experienced any BTRFS problems, it could just be people stuck on RHEL running kernel 2.6 / 3.2.


Outside of docker, most people aren't using snapshots super heavily, which sounds like the root cause here.


ZFS is coming for Docker. This is a natural move. Now that enterprise relies on the cloud they need enterprise-reliable storage -- and ZFS is proven. Looking forward to the ZFS-hosted cloud future.


Funny thing is that if ZFS happens in Docker it'll be only because the community demanded it over the opposition of the core Docker maintainers. AFAICT, Aufs is being dropped due to massive opposition from Redhat. (Apparently AUFS code is too horrible to merge/support???) Despite its spotty track record, BTRFS is the only current recommended/supported CoW file system, and OverlayFS looks to be the docker-blessed next-gen Aufs alternative.

Based on past experiences, I don't see Docker-blessed ZFS support coming anytime soon, but I hope I'm wrong. Maybe someone over at Joyent or Oracle can grease the wheels here? :-)


You mean Joyent?[1] Which has existed for quite a while.

1. https://www.joyent.com/


The out-of-space / metadata balancing problem has bitten me more times than I care to count. It's essentially a fact of life that I have to blow away /var/lib/docker and all its subvolumes every few weeks on any given machine, to clear an out-of-space problem.


Copying a sentence from the first reply on the list?


That bug has been fixed, please update your kernel




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: