However, it's worth noting that RAID, especially in software on systems without ECC ram is less than ideal. Beyond this is the overhead of managing larger filesystems with ZFS. The DIY raid servers that support it have had some troublesome issues that I've experienced first hand.
It's likely a lot of these advantages have been displaced by the discontinuation of Apple's server projects as well as other fears. By similar note, I've always been somewhat surprised that NTFS hasn't been more popular for external media, as it's a pretty good fit there.
In the end, software at this level has been held back significantly by the patent hornets nest that exists in the world today. I truly hope that we can see some reform in this space, but with the likes of TPP and similar treaty negotiations around the world today, that is pretty unlikely. Eventually some countries will have to strike out, break these bad treaties and reign in IP law. The vast majority of software is undeserving of patent protection. Just as software really shouldn't have copyright law that lasts for decades. It's a shame all around.
We are full steam ahead!
We transitioned all of rsync.net, globally, to ZFS starting in ate 2012. After that, we began offering zfs send and receive (over ssh, of course) into rsync.net filesystems.
It's working wonderfully. We have many, many customers that do extremely efficient backups of very large zpools/snapshots into the rsync.net location of their choice. It's been tested and reviewed favorably in the press.
I've gone about ten years thinking running a defrag - particular on a virtual disk sitting on a SAN in a RAID array - made no sense. Today's downtime says otherwise.
It's one thing to perform poorly. But seeing services just die with disk write errors on a disk that's 30% full has left me with a very low opinion of NTFS. Our Oracle person sounds like a desktop maintenance person from the 90's making an issue out of "the importance of weekly defrags".
Windows can clearly handle big databases. The people getting paid to run the platform need to have a clue about the platforms they are running.
I'd say it depends on the use case. For business storage servers, especially "mission-critical", sure. (Then again, this data was created somewhere; presumably mainly on laptops without ECC. Why is that not a bad idea?)
But for a home storage server, ECC is no more critical than buying a UPS, shielded SATA cables, enterprise-grade drives, a high quality power supply, etc. etc. Together these upgrades will easily double the cost of the storage server. Since you'll want an off-site backup solution anyway in case of disaster, why bother paying for an expensive server that only reduces the chance of needing to restore from backup from 0.001% to 0.00001%?
Flipping it around: what's the point of backing up to a remote server if the data you're backing up is itself corrupted?
I don't think most of the upgrades you mentioned are really in the same bucket as ECC DRAM because only the ECC DRAM seems likely to prevent data corruption. Failing drives and cables and power loss, even failures that corrupt data, should be survivable events assuming drive redundancy, particularly given a filesystem like ZFS that will detect and repair the corruption.
If everyone used ECC RAM, it would be marginally more expensive than regular RAM, but it would let us (theoretically) cut the number of real-life crashes by a huge degree. I think if every computer system were like this, it would have non-obvious positive effects on society (like the ability to safely give computers increased responsibility).
Do you have a source for that claim? I know people have done tests with running large molecular simulations on hundreds of GPUs with/without ECC for days on end, and who've come to the conclusions that ECC probably isn't worth it .
And this makes sense. Most of a program's memory usage is data. Thus the rare event of a bitflip is most likely to occur in data. Such a bitflip doesn't cause a crash in a consumer system; maybe it changes the color of one pixel in one frame of a video file, or adds some other tiny noise somewhere.
The question is then: would reducing one crash per lifetime of the machine on every ten machines be a huge improvement?
I think you understand this but in case it's not clear to others (I wasn't sure until I got to the end of the paper) all of the machines in their study are using ECC memory. This allows them to measure both the number of Correctable Errors (single bit) and Uncorrectable Errors (more than single).
I think you are right that the chances of a single bit flip for a randomly chosen machine are about 25%. In Table 1, they say CE incidence for a randomly chosen machine is 32% per year. But they also show that a machine that has one correctable memory error is much more likely to exhibit many more.
I think that "CE Median Affct" (also in Table 1) says that among the machines that have been affected by at least one correctable error, the mean number of errors is 277. Thus rather than one crash per 10 machines over 3 years, it's likely that you'll get hundreds of crashes, but with one (or zero, or two) of those 10 machines responsible for all of them.
The Uncorrectable Error rate they found was about 1/20 of the correctable rate, a little over 1% per year. Assuming the same "use time" factor applies, this would mean that with ECC none of the 10 machines would be likely to ever crash due to bit flips, reducing the number of crashes by hundreds.
But this doesn't tell the full story either. Maybe it's best to view it as 1000 employees, each of whom is issued a laptop. If you buy them machines without ECC, the initial cost is less but 100 of them will experience frequent crashes during the lifetime of the laptop, leading to frustration, possible lost work, and early machine replacement.
If you pay more up front for ECC, the number of affected employees will be reduced from 100 to 5. How much extra one should be willing to pay for this reduction depends on circumstances, but I think the impact is quite different than merely avoiding 100 equally distributed random crashes over the course of 3 years.
I mean, how many companies do you know that give all their workers Xeon-powered laptops? My impression is that 99% of people have i3/5/7 powered machines at work.
Are really companies experiencing frequent failures caused by RAM on 10% of their employee's machines? Can anyone with such knowledge weigh in?
While reducing bitflip on all devices would be a nice step forward for consumer electronics, pragmatically I have to agree with you that it would have negligible impact in the real world. Those that need it generally already pay for it, and those that don't already find countless other and more frequent ways of breaking their machines.
I've seen users place PC and Macs next to radiators or surround their device with folders and reference books, oblivious to the fact that computers can overheat and need ventilation.
I've seen users trying to use Excel as a database, then get impatience when Excel starts hanging or Windows starts running slow, and retaliate by pressing more buttons more rapidly thus locking the entire OS up.
I've seen users who would just switch the power off at the wall even night. Overtime NTFS would corrupt to a point where random crashes would happen.
I've also seen people physical attack - kick and thump - their computer when it runs slowly, an application crashes, or just through bad temper. This can damage the HDD and lead to more crashes.
There are also users who think they understand computers, so they like to tinker with lower level settings (IRQs, paging, CMOS settings, driver options, etc) then get surprised when things behave unexpectedly.
Most recently though, I had to explain to one guy who installed his own motherboard and accidentally cut a deep scratch into it, why his random crashes are potentially his own fault.
So I definitely blame users for most PC crashes. Even with the few number of kernel panics I've had over the last 15 or so years, I can attribute most of them to myself. eg I was experimenting with undocumented Windows APIs to change low level behaviors. Or I was experimenting with running non-supported file systems as my main OS drive.
What's the worst that's going to happen? Some photo is going to get a single bit error? So what? The data doesn't need super high fidelity. What about a file system crash? Well that's what backups are for.
 I would also like to take time that I found the FreeNAS community completely horrible. They're focused solely on people setting up like HIPPA compliant scientific data, or something. It's ZFS for everything. Backups to USB drives are stupid, and you're stupid for having that as a backup strategy, and instead should drop another $1000 on a second NAS for your house, and then backup across a network.
OpenMediaVault's community on the other hand is much more helpful and open to both professional and personal use cases.
A fact that complicates this argument is the prevalence of compressed data formats. A wrong bit in a photo could corrupt the metadata, or a block of DCT coefficients, such that a whole image block or scan line does not decode, effectively ruining the photo. A wrong bit in a .tar.gz could well corrupt the entire archive. You never know where that bad bit will land.
(Yes, this has happened to me -- a single bit flip, probably from RAM error, corrupted a large .tar.gz -- I tracked it down to the bit that changed because I had another copy.)
In fairness to the FreeNAS community, USB drives often aren't a good backup solution:
-> USB thumb drives break all the time. They're terrible choice for any long term storage plan. So you're better off with a HDD / SSD
-> Keeping a USB device inserted all the time means the backup is permanently accessible. This is great for convenience but terrible for a backup as malware (eg ransomware) could access and write to your backup. And even if your backup isn't mounted, it could still be subject to any other physical disaster that knocks out your main archive of data (eg flooding). Backups should always be kept separate - ideally offsite.
-> So now you've got a removable HDD that you need to connect and disconnect frequently. You better buy something reasonably rugged because cheaper drives might fail due to frequent manhandling. Which means you're no longer looking at a budget device.
-> Finally, USB backup solutions cannot be automated. So there's always the risk that someone would let their backups grow out of date. If not out of forgetfulness then just out of plain laziness (eg "I'm too busy to do a backup right now, I'll do it 'tomorrow'").
So while USB storage can be workable as a backup solution, there are a lot of caveats that would invalidate the whole point of a backup if one isn't careful.
I mean if you have an 8TB NAS, and you want to use an 8TB USB drive to backup said NAS locally, I'm not sure I see that as a problem. Backup doesn't HAVE to be significantly more durable than the source material, which is why multiple backups and at least one off site are recommended.
After my time doing support for iomega (so long ago), I don't consider anything really backed up unless there are at least 3 copies, and at least 1 of them is offsite.
The mid to high end stuff are. The lower end is less so. But you pay a premium for decent storage capacity on a decent thumb drive, so either way, you're back to an external HDD / SSD drive.
> "and other USB storage effectively an HDD in an enclosure with a USB interface?"
Indeed. Hense why I said "So you're better off with a HDD / SSD" when referring to other USB storage devices.
> "I mean if you have an 8TB NAS, and you want to use an 8TB USB drive to backup said NAS locally, I'm not sure I see that as a problem. Backup doesn't HAVE to be significantly more durable than the source material, which is why multiple backups and at least one off site are recommended."
I don't have an issue with USB storage per se, I was just saying there are caveats to consider. And while you're right that backups don't have to be more durable than the source, you have to bare in mind that the kind of people who would be looking for advice about using USB devices as a backup solution would likely be the same kind of people who wouldn't have multiple USB devices nor the kind of people who would test their backups to ensure the medium hasn't degraded. They also wouldn't likely be the same people to keep their backups offsite. As these kinds of checks and additions can add significant costs.
Or to put it another way, if one is unwilling to spend $1000 on a second NAS then they are unlikely to want to spend $1000 on a few decent external drives. So that person is going to start scaling back their requirements (cheaper drives, fewer drives, etc) and quickly end up in a situation where their backup solution is total garbage.
Bare in mind, the kind of people who would wander onto FreeNAS's forums looking for backup advice are unlikely to be people like you and I who understand how to implement these things correctly. So it doesn't surprise me that many members of the FreeNAS community veto recommending USB drives knowing how easy it would be for someone inexperienced to get things wrong (eg leaving their USB device connected forgetting that some malware would just write to USB device as well)
I don't follow the reasoning here? It's not like ZFS gets worse if you're not using ECC RAM. Rather it's: ZFS is reliable enough that using ECC RAM becomes worthwhile. ZFS is still more reliable than ext4 even if you're not using ECC RAM.
E.g. with rsync, you use the --backup and --suffix flags.
But if the problem is the RAM or CPU, the bit could also flip before checksumming. Then the incorrectly written data will have a matching checksum and the corruption can't be detected.
That's why ECC is important.
But notice that a bit-error in a text file in plan storage has some consequences. That same bit-error in a text file in a RAID (or encrypted storage) has very different consequences.
Some bad ideas are just worse than others.
Having an error on one disk, versus multiple disks in a RAID totally matters.
Serious question: Do they not teach Hamming codes in undergrad anymore?
Writing corrupt data once or writing it a thousand times redundantly with parity data looks exactly the same, if it's allowed to happen. It's a valid write of invalid data. Either way, the data you have on your cheap laptop drive or your six-figure storage array is corrupt. This is why the ancestor commenter is correct about the storage technology being irrelevant to the choice to use or not use ECC.
I haven't been in school for many years. They taught Hamming codes, but they also taught garbage-in, garbage-out.
Isn't that because NTFS is Microsoft-proprietary? AFAIK all the existing third-party implementations are reverse-engineered and hence not 100% safe. Meanwhile even HFS+ has a public specification published by Apple.
It turns out that some raid controllers (Including the very popular LSI variety) + certain kernel versions result in pools and drives being dropped from the array for no discernible reason. Nothing [useful] to logs, just reboot and pray
Non-ECC RAM is just asking for trouble in any circumstance where you care about both data integrity and IOPS. Bitflips can and will get you, now they won't wind up hosing your entire pool (most likely) like some apocryphal stories suggest. But, you'll get bad data replicated and the possibility of borking some other sector during the 'recovery' process from the bitflip.
When you're in a large enough pool (PB-scale) this becomes even more painful...
And it's also a fair point that for enterprise storage running at 80%-90% of capacity is a reasonable restriction whereas the drive on my laptop is always basically %99 full. It will be interesting to see how APFS addresses this since I'd guess that most of their users look more like my laptop than my file server.
Don't get me wrong I like ZFS/btrfs; I adore snapshot send/receive. It at times though really handicaps itself.
In short, if you want more consistent performance as the pool fills, disable metaslab_lba_weighting_enabled, but be prepared to lose some sequential performance when the pool is empty.
* no reserved space on the disk means you can fill it up and then not be able to delete files or extend it because there was no space for COW operations. Sun's answer? Delete the filesystem, recreate and restore from backup.
* zpool management is a pain in the arse generally, and the zpool/zfs distinction is pretty clearly designed on the assumption you'll have sysadmins paying a lot of attention to the low lever FS and pool management.
Imagine Steve Jobs being told "your disk filled up and now you need to reformat it". Apple didn't reject ZFS because (whatever some embittered Sun blogger says) of NIH. They rejected it because it wasn't fit for purpose as a consumer desktop filesystem.
The first time I encountered the disk full problem was 2009. There were emails from Sun engineers saying "this is a dumb situation, we should have a reserved block of storage to avoid this" from 2006 and nothing had been done. Why would anyone in Apple believe if was going to get better if Sun's response to paying customers was "reformat and rebuild" for at least three years?
There was no way of fixing the limitations around vdevs, like not being able to grow existing arrays, not mixing block sizes without a big perfomance hit, and so on, without a huge overhaul of some of the fundamentals.
ZFS was, like a lot of Sun technologies, built on assumptions from a world where machines are scarce, have a high ratio of admins:machines, and many users per machines. It's excellent at meeting its design goals, but there is no way I can imagine Apple shipping it when they'd be telling people, "want to grow your Mac Pro's array? Rebuild it. Want to reclaim space on a full system? Rebuild it. Want avoid dropping off a performance cliff? Never drop below 20% free space."
(As an aside, I've been told by Sun engineers around the same 2008-2009 era that dependency resolution in packagme managers for distros proved Linux was a shit toy for morons because "real" OSes assumed skilled engineers who would be inspecting patches package-by-package so they knew exactly what would be on the system. Even recently, with Solaris 11 shipping with a package manager, I've had Sun veterans complain that "Solaris is dead" because it's "too much like Linux". Sun appears to have fostered, at least in areas I've dealt with, a culture that's completely out-of-step with how computing works this century. I doubt it would have taken too many of those sorts of conversations to convince people at Apple that Sun just didn't get Apple's "just works" vision.)
You mention it shortly, and I belive that ZFS was designed for a world where growing existing arrays, reaching 80+% utilization, or using different block sizes all is things that never happen anyway. ZFS could do with a rewrite to allow this, but it would not be ZFS compatible anymore.
But I will not deny that would you describe could be a telltale sign of a development team Apple simply cannot work with. But nothing prevent Apple from forking OpenZFS and work on it in house (perhaps apart from patent issues).
This worries me because Raid-Z2 with two hot spares is exactly the system that I was planning to set up.
That combined with using Seagate 7200 rpm drives in a year that was particularly bad (an offline storage company mentioned them in a blog article iirc). just a bad chain of events. The drives when I pulled out and tested them individually after the entire array crashed, about 7 out of 12 had significant issues, and of the other 5, 2 more developed errors shortly after (I only used them as scratch/project drives).
Honestly, today, I'd probably get another Synology box, a bit more expensive, but been so much less hassle. I had upgraded my 4-drive (2010 plus model iirc) synology box to 4TB wd red drives, and been running that ever since. I don't think I'd ever do a homebrew NAS on a single machine ever again. If I had to do something in a company, would probably lean towards distributed file systems for that level of redundancy.
My Synology NAS device has been in service for around 6 years now... I upgraded the drives from 4x1tb to 4x4TB when the freenas server bit the dust... and it's still going strong... I don't have to think about it, it just sits there and runs.... A single-purpose software like this should do its' job. The end.
The main big data file system I am keeping my eye on is HAMMER2... but development is slow. I really like a lot of things Matthew Dillon is doing there though, especially the network stack. If I were doing a new isp or wisp I would probably be basing it on dflybsd.
ZFS has this sterling reputation but a fair number of people have actually lost data to it. Some of it is bad hardware,orb ad choices in hardware, some of it is bad administration, it's still happened.
I don't expect any filesystem to eliminate the need for actual backups, but
ZFS would have been totally brilliant on a 90s-style multiuser enterprise or higher education deployment, though. It just missed its window.
It seems everyone at some point expected Btrfs to shoot ahead and leave other file systems in the dust, so there was no point in bothering with ZFS, "just wait a bit and Btrfs will be the default everywhere". And besides ZFS has all the legal issues with it.
But it seems Btrfs progress was rather slow, so even in spite of legal issue interest in ZFS is still growing.
Btrfs was not an fs in wide production use the way xfs or ext3 / ext4 were, so it's understandable that btrfs progress is slow. The two pieces of software are just really far apart in timelines and maturity. People are paying attention to ZFS mainly because it's had a history of working and functioning in production environments and in heavy load, for a decade now.
In my opinion, in order to have parity with ZFS, some big distribution (bigger than Suse, maybe Debian or Ubuntu) needs to set btrfs as its default fs, and have banks, stock exchanges, dns providers, virtual hosting companies, etc hammer the hell out of it for another 6 or 7 years and get all the bugs that come out of those experiences fixed. Then people will start paying attention to it.
afaik btrfs development started 2007 (https://en.wikipedia.org/wiki/Btrfs)
ZFS was introduced as a part of Solaris in 2005, released as openZFS in 2006 (https://en.wikipedia.org/wiki/ZFS)
I want to like it, we really need a nice, stable next-generation file system in Linux land (that doesn't have the licensing issues associated with ZFS)
It has been my experience with using BTRFS for my own personal data that it's the less tested code paths which had / might have the issues you're describing. Things like sparse files (I got bit by this bug), snapshots (regular scrubbing and backups are things you're doing already right), and other features that are less frequently used.
A small production cluster where I work uses XFS over BTRFS though, because of the very FUD that's mentioned and the type of storage happening on it matching the sparse pattern (even though that bug should be fixed).
Not that ZFS won’t run well on SSD, but it feels like there’s a gulf between filesystems designed with SSDs in mind and those designed with spindles in mind.
On the flip side, with spinning storage approaching 10tb drives, the need for enhanced filesystems for RAID arrays with those beasts are needed as well.
In general, we really need some patent releases from those companies strangling filesystem advancements (namely Oracle and MS).
[edit: it was a BSD Now episode]
At that point the word came down from Apple Legal to Engineering that it wasn't happening. And there's nothing more insurmountable than your own legal department.
Just to note: It doesn't really work this way, even at Apple.
It really means "legal was better at arguing their side than whoever argued against them to the SVP/CEO who made the decision"
While apple is worse than most (from talking to friends and counterparts), no company, even Apple, is so silly that the legal department can't be overruled (any general purpose software company that works like that goes bankrupt pretty quickly).
It just takes a really good business case for doing so, and none existed here.
That is,if it had been considered super-mission-critical, it just would have happened anyway, they would have taken the risk.
(I kind of just hate when these stories get portrayed as legal ruling with an iron fist with nobody having any say over them.)
My question is why did Apple think they needed to make a deal to use ZFS in the first place, and if so has Canonical (who says they'll ship openZFS with Ubuntu) made a deal with Oracle ?
It's true that OpenZFS is more than Oracle's ZFS, but unless I'm mistaken, the vast majority of code in that project is still owned by Oracle.
This article makes me uneasy.
Rumors in any case.
How... unnecessarily inflammatory. Hardlinked backups work remarkably well, and are incredibly simple to implement and understand. Of course they can be corrupted, but then again, so can every other form of backup in existence (that said, there are no protections built-in to a hardlinked backup).
Most of this is indeed directly related to directory backups. I also do Snapshots of sorts, but it only makes hardlinks to files, not directories. That proves to be remarkably more reliable.
Anyone else here using it?
(I'm also frustrated that it's not possible to change redundancy within a vdev — e.g. go from RAID-Z1 to RAID-Z2.)
At first, I assumed that RAIDZ was the obvious way to go. But I switched to a concatenation of mirrored pairs; I grow the array by adding another pair to it.
I've actually had six disks fail in six months, without data loss. Was scary. But wow.