HFS+ Bit Rot

chuckup · on June 11, 2014

I'm surprised how little is done to protect data in modern systems. The two big things we should be doing - ECC ram and checksumming filesystems - are still nowhere close to being mainstream.

I recently bought and returned three(!) 3TB drives because I found they were silently corrupting data every 500GB of writes, or so (verified by testing on multiple systems) - I switched to 2TB from another brand, and had zero issues. I only knew there was a problem because I wrote a burn-in script to copy & verify over and over. File system, OS does not care. It's the drive's job.

Almost every USB stick I've ever used eventually had some small corruption issue - again, no way to catch this unless you are looking for this kind of thing.

Average consumer does not think this is even possible - the idea of your data silently getting scrambled seems impossible, like a car randomly being unable to break - it is just assumed this sort of thing does not happen. But as hard drives get bigger and people put more RAM in their systems I think this will become a huge issue. Of course, consumers will blame "a virus" or something along those lines.

binarycrusader · on June 11, 2014

It doesn't help that Intel has artificially limited ECC support to specific processors. As the amount of memory in PC increases and the frequency at which components operate, the probability of memory errors increases, yet Intel still foolishly refuses to support ECC in their consumer line selling it as a "server" feature and limiting it to Xeons.

chuckup · on June 11, 2014

Linus posted about this a while ago: http://www.realworldtech.com/forum/?threadid=114758&curposti...

"Who the f*ck are they to send me reliability patches, when they can't even get the basics right?"

I remember reading Microsoft made a push to get hardware vendors to use ECC Ram with Vista - they recognized a lot of crashes were due to this (but XP would get the blame). No go.

theandrewbailey · on June 11, 2014

Perhaps AMD could enable ECC on their consumer CPUs?

binarycrusader · on June 11, 2014

AMD does for their Athlon FX processors, but support is mostly dependent on the motherboard. With that said, most of the AMD motherboards for AM3+ support it, and it's limited to unbuffered ECC.

Unfortunately, that means choosing ECC over power and performance vs. Intel. It's really a sad state of affairs.

Freaky · on June 11, 2014

They do. Motherboard vendors often fail to support it, unfortunately, but the CPUs generally aren't the weak link.

nandhp · on June 11, 2014

> HFS+ is seriously old

Sure, in computing terms 1998 is an eon ago. But that's not a good reason to stop using a file system. Lots of older file systems are still in use: FAT32 (1996; still the default for SD cards <32GB and anytime you need a cross-platform filesystem), XFS (1994; the new default filesystem for RHEL7), NTFS (July 1993; WinFS still hasn't materialized), and ext2 (January 1993; still commonly used, particularly in situations where a Journal is not required).

Of course, I'm perfectly happy to believe that HFS+ is more badly-designed than any of the other filesystems from that time. But a filesystem doesn't need to be replaced just because it's no longer trendy.

MBCook · on June 11, 2014

Here's the thing. I used FAT16 for years and years and years and never had a problem. I used FAT32 for the majority of a decade and never had a problem. I used NTFS for a number of years and never had a problem.

While NFS+ isn't quite as bad right now, I can tell you that the 10.4-10.5 days it seem to enjoy corrupting files. You rebooted your computer? Here's a couple of missing files in your trash. Hope you can find what binary thing on your multi hundred gigabyte disk they fit in.

NFS+ is the ONLY filesystem I have ever used that I have lost bits of data on from silent corruption. I don't trust it. I would kill for Apple to adopt ZFS or NTFS. At this point I want it check summing all of my files and my filesystem so I know that it's not being corrupted silently by the OS making weird mistakes on a filesystem that should've been replaced 10 years ago and is just hack after hack on a system designed for a computer that only had 128 K of memory.

I love everything about OS X. Except HFS plus, which needs to die in a fire.

Nothing is quite as much fun as looking at an old picture or listening to a song I haven't listened to in quite a while just to find that it's actually silently corrupted and has been for multiple years and I probably don't have the correct version on a backup.

I'd trust ext2 over HFS+.

Yes, I'm bitter.

iSnow · on June 11, 2014

>I used FAT16 for years and years and years and never had a problem. I used FAT32 for the majority of a decade and never had a problem

>I'd trust ext2 over HFS+.

>Yes, I'm bitter.

Unfortunately not only bitter but bitter to the point of senseless argument just of the sake of argument. Trusting a non-journaled FS over a journaled without error correction/detection is a bad choice, like career-limiting bad. And if you never had problems with FATXX then you most likely never used DOS or old Windows seriously. Corrupted data and specifically corrupted MBR was really frequent back then (like corrupted b-trees on the Mac side)

Dylan16807 · on June 11, 2014

Most journals are metadata-only, so they save you from corrupt directories but don't protect contents at all. NTFS loves to wipe out just-written files when I lose power, for example. Files that previously existed, mind you, now suddenly garbage.

lloeki · on June 11, 2014

> And if you never had problems with FATXX then you most likely never used DOS or old Windows seriously

Ah, the joy of "ghost" folders that ended up containing themselves...

MBCook · on June 11, 2014

It's journaled now. I used DOS and Windows for years and never had any real problems (well, on hard drives, floppies were more vulnerable).

frik · on June 11, 2014

Afaik, around Mac OS X 10.5 had native ZFS support (but not as boot partition).

And there is http://code.google.com/p/maczfs/ and http://downloads.maczfs.org/

klausa · on June 11, 2014

They promised it at WWDC when they announced 10.5, but silently scraped the feature before release.

msbarnett · on June 11, 2014

ZFS was never promised as the way forward for the default filesystem on consumer machines.

The truth is despite all the Siracusa "We Want ZFS" nonsense, it's an absolutely terrible choice for a consumer filesystem. All of the nice features come at the cost of it eating an enormous amount of RAM and CPU resources, and it's integrity guarantees are meaningless if the machine isn't running ECC RAM anyways.

Apple was never going to ship laptops running 10.5 where 4 Gigs of RAM were immediately eaten by the filesystem and battery life was halved from 10.4 due to the extra CPU load. ZFS as the default for OS X was always pipe-dream nonsense promoted by people with no understanding of or experience with ZFS.

laumars · on June 12, 2014

And equally you're over exaggerating the demands ZFS has; it runs quite happily on lower end hardware. Granted you need a 64bit CPU, but all Macs are shipped with one anyway. And unless you have deduping et al turned on (which you don't really want on a desktop anyway), the CPU load would be minimal. You could probably even get away without compression as well (though the modern algorithms ZFS supports are pretty light on the CPU).

As for memory, the minimum you need for pre-fetching is 2GB of free RAM. ZFS will, however, run on less with pre-fetching turned off.

The thing with ZFS is that it will consume your RAM and CPU if you have all the uber features enabled and a boat load of pools created, but since you don't need nor want that on a desktop, ZFS shouldn't need to be that resource hungry.

What's more, part of the misconception of ZFS's bloat comes from the age of the file system. My original ZFS server was less powerful than most budget laptops these days (and I was running a small number of VMs off that thing as well)! So modern systems definitely do have the resources to run ZFS as their desktop fs.

oakwhiz · on June 12, 2014

I still trust non-ECC RAM a lot more than my disk.

aktau · on June 12, 2014

Behold what I believe is the spiritual successor and which I'm going to play around with pretty soon: https://openzfsonosx.org/

HFS+ hasn't exactly failed me spectacularly, but I'll be more at ease with my sensitive and valuable data on ZFS.

duskwuff · on June 11, 2014

NTFS isn't a single bit better than HFS+ in terms of checksums.

Also, HFS wasn't used on the original Macintosh 128K -- that used a simpler filesystem called MFS. HFS was introduced with the Mac Plus a couple of years later.

MBCook · on June 11, 2014

NTFS may not checksum everything, but I also never had it silently corrupt my files and helpfully provide me the 'missing parts' when I rebooted.

rbanffy · on June 11, 2014

The most terrible thing with silent corruption is that, unless you verify your data, you'll never know it happened because it's, well, silent.

Have you checksummed your files lately?

Freaky · on June 11, 2014

I've experienced silent corruption on NTFS. And it still modifies data in-place, so an unlucky power failure or reboot could still cause data loss. Thankfully Microsoft already have an answer, ReFS: http://msdn.microsoft.com/en-us/library/windows/desktop/hh84...

Elhana · on June 11, 2014

Silent corruption can hit NTFS just as well. The only protection is to have a good hardware with ECC RAM, raid with redundancy and a FS that checks data on every read like zfs,btrfs,ReFS,... and backups.

gtaylor · on June 11, 2014

> But a filesystem doesn't need to be replaced just because it's no longer trendy.

The author wasn't just saying that its age was damning, it was the fact that the FS hasn't changed much since 1998.

Since XFS was mentioned, I wanted to mention that it's hard to compare to HFS. The author mentions HFS' stagnation, but XFS has seen tons of improvement since 1994. It seems like I'm reading about some cool new XFS development every year. NTFS has made progress since its early years, too.

coldtea · on June 11, 2014

>The author wasn't just saying that its age was damning, it was the fact that the FS hasn't changed much since 1998.

And yet, he goes to mention tons of changes to the FS over the years, including journaling...

zymhan · on June 11, 2014

Yeah, I'm not sure what the difference is between what the author calls "hacks" and others call "updates."

jackjeff · on June 11, 2014

Some of these updates are clearly hacks, in the sense that if you did not have to worry about backward compatibility, they would certainly not be implemented in that way.

They're not like upgrades from ext2 to ext3, it's like ext2 was modified in a weird way to support new features that older ext2 implementations do not know about.

Hard Links for example are implemented by adding files which look a lot like symblinks. As soon as you create hard links to a file, the file is moved into a invisible folder named "Private Data" and becomes a "node file". The original file is replaced by a stub containing the CNID (special type, special content). That's not as efficient as having iNodes and hard links from day 1. If you mount such a volume on Mac OS Classic, the "Private Data" folder is totally accessible and modifiable by the user, and of course hard links do not work.

Hot file clustering and journaling are implemented in similar ways to hard links (invisible folders).

The opposite example are extended attributes (xattr) which were added in 10.4. A special filesystem structure named the "Attributes File" was reserved in HFS+ from day 1 but never used. De-defragmentation does not rely on any changes to the volume, it's purely done in the Filesystem driver stack.

Things like Copy-on-Write or Snapshotting, are probably impossible to implement in a backward compatible manner though...

jes5199 · on June 11, 2014

> They're not like upgrades from ext2 to ext3, it's like ext2 was modified in a weird way to support new features that older ext2 implementations do not know about.

It's my understand that this is exactly what ext3 is, though! It's designed to look like ext2, but with some extra hidden stuff for journaling...

danieldk · on June 11, 2014

Yes, in fact, we used to convert ext2 to ext3 with one single command:

  tune2fs -j /dev/<device>

This adds a journal to the filesystem, making it ext3. In fact, you could mount ext3 as ext2 (you probably still can). It would just not replay the journal.

The same is pretty much true for ext4, it primarily consists of extension for ext3, but forked as a separate filesystem in order not to destabilise ext3. You can mount an ext2 or ext3 filesystem as ext4. You can mount an ext4 filesystem without extents as ext3.

It should also be noted that ext2 is strongly influenced by UFS1 (FFS), which has been around forever.

ext[234] is not a modern filesystem. However, it is very well understood and stable.

noselasd · on June 11, 2014

Indeed, ext3 nor ext4 would have been implemented the way it is if it didn't have to worry about backwards compatibility.

zymhan · on June 11, 2014

How often can you actually implement new features in an entrenched filesystem and not have to worry about backwards compatibility. This is the case for ext2/3/4, NTFS, and I'm sure many others.

laumars · on June 11, 2014

Nobody is suggesting that things are bad because they're old. They're saying that HFS+ model older design practices that aren't relevant for modern file system designs and expectations. And what's more, you're kind of disproving your point with those examples:

* FAT32 is frequently criticised for being crap and the FAT32 variants that fix many of it's short comings are often stuck behind Ms patents. In short, FAT32 is a terrible file system that needs to die more urgently than HFS+.

* ext2 isn't really used for anything other than a direct replacement for FAT32. It's not really a practical fs for modern systems and shouldn't really be used on one.

* NTFS isn't a static file system. It's like saying ext is decades old when ext4 is practically a whole other file system to ext2 (while still offering some degree of backwards compatibility). NTFS is similar in the way how it has incremental versions. However even then, NTFS does still have it's critics and, as you mentioned yourself, MS have tried to replace it on a few occasions.

XFS is really the only example you've come up with that works in your context. It's also one of the few file systems I don't have any personal experience with so I couldn't answer how it's managed to keep up with the pace of technology.

greyfade · on June 11, 2014

Age isn't the argument that I see. The writer is complaining that HFS+ has failed to keep up with modern needs.

HFS+, very much unlike nearly every other filesystem in existence, was designed with only one real feature in mind: associating "resources" with "files." As I understand, HFS+ has two master files; one contains icons, filenames, metadata, etc., and the other contains the file data streams. This design is extremely prone to fragmentation, and it was created with static oversimplified data structures that are not forward-compatible with advances in storage features and capacity.

As a result, HFS+ simply can't meet the needs of a modern computer user the way virtually any other *nix-ish inode filesystem can. It doesn't have room in its data structures for proper error correction or failure recovery, and it's impossible to achieve atomicity or any reasonable level of reliability and performance.

Virtually every other modern filesystem has those attributes, and for good reason: They're supposed to be reliable.

ricardobeat · on June 11, 2014

Are filesystem-related issues common among the half billion devices running OSX/iOS? I believe hardware completely overshadows software re. storage reliability and performance.

frakturfreund · on June 11, 2014

Instead of using ext2, just use ext4 (2008) without a journal to get 16 Years of Progress :)

mzs · on June 11, 2014

The two biggest problems I have with HFS+ are no sparse file support and the catalog file (limits concurrency). It would be nice to have timestamps better than 1s too.

userbinator · on June 11, 2014

Hard drives (of both the magnetic and flash-based variety) all have built-in error detection and correction. If you are getting corrupt files that's not the filesystem's fault, it's most likely a problem with the hardware.

Checksums at the FS level are very rare; the majority of the ones in use don't have them (http://en.wikipedia.org/wiki/Comparison_of_file_systems ) and yet they function perfectly fine. HFS+ is not the problem here.

acdha · on June 11, 2014

The problem is that there are other sources of error - e.g. data corruption in transit rather than on the disk itself – and the legacy methods have error rates which are too high for modern data volumes. There are a couple of implementation problems as well: the lower-level error correction mechanisms tend hide information from the higher-level interfaces, making it hard to measure real error-rates, and some classes of errors aren't randomly distributed and were more likely to produce errors which simple schemes couldn't detect.

http://queue.acm.org/detail.cfm?id=1317403 is a good article by someone at NetApp describing everything which can go wrong with hard drives, including this class of error.

There are two good papers on measured real-world error rates:

http://indico.cern.ch/event/13797/session/0/material/paper/1... http://www.cs.toronto.edu/~bianca/papers/fast08.pdf

The good news is that many of these errors were caught but there are examples which were not and the real message is that the entire stack has enough complexity lurking in it that you wouldn't want to simply assume it handles something as critical as data integrity. Something like the ZFS / brtfs approach is nice because it doesn't depend on all of those layers working as expected, is guaranteed to be monitorable and is much less likely to silently change without notice.

thrownaway2424 · on June 11, 2014

I thought one of those links was going to be "Parity Lost and Parity Regained", but no? One of them is by the same authors, on the same topic, from the proceedings of the same conference, but it's a different paper? Weird.

http://research.cs.wisc.edu/adsl/Publications/parity-fast08....

acdha · on June 11, 2014

Complete oversight on my part – thanks for the link.

jsz0 · on June 11, 2014

ZFS / births are great for their intended purposes but using them on devices with limited RAM and/or connected via external bus/power would likely introduce a whole new set of problems.

yungchin · on June 11, 2014

What's the problem with having those on external buses?

Freaky · on June 11, 2014

> If you are getting corrupt files that's not the filesystem's fault, it's most likely a problem with the hardware.

I don't think the implication is that the fs is at fault for the corruption - it's just at fault for failing to detect it. Hardware problems tend toward certainty over long enough time scales - doesn't it make sense to defend against it given the relatively minimal cost of doing so?

> Checksums at the FS level are very rare; the majority of the ones in use don't have them .. and yet they function perfectly fine.

No, they too allow data to silently become corrupt in face of imperfectly functioning hardware. Sure, it normally doesn't happen, but it's certainly not rare enough to warrant ignoring if your data is in any way valuable to you.

rcthompson · on June 11, 2014

Are you saying that hard drives store everything on disk with checksums or redundancy? That would be news to me. How else could they correct (or even detect) errors?

Anyway, are you arguing that filesystems shouldn't bother with checksumming at all?

kevinday · on June 11, 2014

Hard drives store things with varying forms of ECC. Each sector has an ECC field, allowing it to detect many errors and automatically correct some.

This isn't a replacement for something better, it just allows simple bit errors to be corrected automatically by the drive. The problem is that it's not really obvious when it's happening, and you only notice when it can't fix something. Drives eventually throw a SMART error when it's had to do too many corrections though.

userbinator · on June 11, 2014

Correctable errors are logged by SMART - see the Raw Read Error Rate, Seek Error Rate, Hardware ECC Recovered, and related attributes.

evntdrvn · on June 11, 2014

There are many levels and types of error checking performed in the various layers between the physical signals on the media and the bits that get sent over the drive interface. They are not simple checksums either. Otherwise a modern hard drive would not function...

mnw21cam · on June 11, 2014

Most modern media does error checking, and usually error correction as well. This isn't primarily because the media is unreliable, although that is true. The main reason to do ECC is to increase the capacity of the media. Having ECC able to cope with a reasonable number of bit errors allows the bits to be crammed closer together on the media while still being reliable.

ksec · on June 11, 2014

The reason why ZFS didn't plan out were simply because it was using too much CPU and Memory. Not something worth considering when the majority of Apple's devices now are Mobile ( Phone / Tablet / Laptop )

It may have been great if Time Capsule, or an Apple NAS uses ZFS. But it seems Apple will likely wants you to move everything to iCloud Drive( Finally! ).

I think Apple's new FileSystem will be based entirely for Flash. Something similar to Samsung's F2FS. Since F2FS is GPLv2 license it is not possible for Apple to use it within their own Kernel.

danieldk · on June 11, 2014

The reason why ZFS didn't plan out were simply because it was using too much CPU and Memory.

I am not sure that is the reason. Apple did already announce it at WWDC after all. ZFS on OS X was announced in June 2007. In September 2007, NetApp sued Sun over patents violations in ZFS.

It's likely that Apple didn't want a patent suit after adopting ZFS as their main file system. 2007 Apple was of a completely different size as 2014 Apple.

msbarnett · on June 11, 2014

They announced support for it, not that it was the new default. Big difference.

ZFS support made sense for Mac OS X Server back in 2007. It's a beefy server filesystem that rewards beefy servers with gobs of ECC RAM, RAID, and no battery to worry about.

ZFS as default replacement for all of Apple's HFS+ usecases (laptops, iPods, phones and tablets in the works) made no sense in 2007 and makes no sense in 2014. ZFS is simply too resource intensive and too dependent on ECC RAM even now for consumer use cases.

rodgerd · on June 11, 2014

ZFS isn't particularly consumer-friendly, either. Explaining why you can't delete files from a 100% full filesystem, or the pain and complexity of trying to move from 512 byte to 4096 byte sectors and so on would be a nightmare. And of course the design assumes high-quality hardware (ECC RAM for example).

XorNot · on June 11, 2014

The design doesn't assume ECC RAM. It's recommended because you can't talk about end-to-end checksumming and then not explain that it can't proof you against unreliable memory.

All other filesystems are equally susceptible - if your memory is getting errors, they'll happily write those to disk too.

rodgerd · on June 11, 2014

Actually, XFS checksumming will alert on errors caused by bad RAM.

XorNot · on June 12, 2014

Depends what you mean by this. ZFS will alert if it reads data into ram that then doesn't match the checksum during verification.

But it can't prevent a bitflip of data in memory from getting written to disk before the checksum is calculated. Nor can it prevent data being bitflipped after its been verified and handed off to the application.

Which I'm pretty sure XFS can't do either.

orkoden · on June 11, 2014

Apple could just as well use HFS+ on iOS and ZFS on OS X until the hardware catches up.

msbarnett · on June 11, 2014

It's not just phones that have to catch up. The CPU and RAM requirements would wreak havoc on laptop battery life as well.

hosay123 · on June 11, 2014

I don't know the numbers, but the probability of getting 26 corrupted at-rest files through natural causes sounds pretty much like winning the lottery twice on the same day you were struck by lightning twice

Checksums wouldn't have fixed this, they'd only alert the user to the fact the damage had already been done, which is exactly what the decompressor did in its own special way.

As another comment points out, error correcting codes are the way to handle this, and its already done in hardware, and probably too expensive to do in software in the general case

rtpg · on June 11, 2014

I think you're discounting the utility of simple checks.

Imagine you have weekly backups , but only for the past 26 weeks because of disk space. A file gets corrupted, but you only view it a year afterwards. You effectively have no way of recovering it.

Knowing something is wrong can be useful (though being able to fix it is even more useful).

Sami_Lehtinen · on June 11, 2014

That's exactly why I use par2 with all important data when backing it up. Bit corruption? So what?

clord · on June 11, 2014

You're right, that reported failure rate is way too low, I'd be glad to only have 26 corrupt files. I have one disk with about 300gb of CR2 files (HFS+). Not long ago I did a similar procedure as in the article (except that I also had the files on a ZFS mirror (I was cleaning up the HFS disk and checking for duplicates before formatting) and found 250 files with bit errors compared to the reference. Fortunately only about 1 in 3 of the bit errors resulted in corrupt files.

Needless to say I no longer keep anything of consequence on non-checksummed filesystems now.

Buge · on June 11, 2014

It annoys me when someone says "you're right" and then completely disagrees with the first person.

vertis · on June 11, 2014

It's easy to read the first persons post either way. That you're lucky they ONLY got 26 corrupt files, or that 26 is extraordinarily unlikely.

Buge · on June 13, 2014

I think the "at-rest files through natural causes" gives it away.

pling · on June 11, 2014

Well I've been using NTFS exclusively since 1996 (NT4) and accumulated 870Gb of data.

I've never had a single byte of corruption. Not one, not ever. The data is checksummed about once a month.

26 is a very high number for a deterministic system compared to zero.

My recentish MBP purchase (2011) resulted in a single corrupt file copying 1/3 of that volume onto the machine over the LAN. this was 10.7 at the time. That scared me a little.

The Linux kit I operate has had no discernable data corruption either in the last 12 years.

I know this is an anecdote but over time that's a huge cumulative error.

kevinchen · on June 11, 2014

I can only give you anecdotal evidence, but hopefully it convinces you that hfs+ is the problem.

I've been using Macs for 5 years now. I've stored time machine backups spanning two drives and OS releases from snow leopard to mavericks. (Each drive was used for about 2.5 years, and the OSes were installed soon after they became available.)

Every 9-15 months, I ask Disk Utility to repair the Time Machine volume. The first time, I lost the entire TM directory and had to format the drive and start over. The second time, it crashed Disk Utility because there were so many errors. The third time, I lost the older ~70% of my backups. The most recent time, the operating system refused to even recognize the disk as an HFS+ volume!

Each time, I had the opportunity to format the disk in question and check it for hardware issues. They all passed badblocks and SMART with flying colors. In addition, the volumes were unmounted properly the vast majority of the time (a few instances of human error and power outage)

tldr: HFS+ loses your data catastrophically even if you use it properly.

aroch · on June 11, 2014

On the other side of the anecdotal coin, I've been running OSX for a decade now and have in the neighborhood of 100TB of HFS+ formatted media, some in RAID some as single drives. The few times I've catastrophically lost data/drives its been due to the drives physically failing or, in the case of an SSD, a firmware bug. I've lost data to the filesystem (bitrot) but that wouldn't really be prevented by any other FS, they'd just tell me sooner.

By the same token, I've experienced much more corruption on my ZFS (~200TB) and btrfs (58TB) arrays over the last few years than I have on HFS for a decade.

Tloewald · on June 11, 2014

@Alupis -- um HFS+ is journaled, so I'm not really sure what your point is.

Alupis · on June 11, 2014

You know you're a real tech-y when you get fired up over filesystems!

mzs · on June 11, 2014

I've seen similar and noticed drives used for TM tend to fail more than other HFS+ formatted drives. One of my hunches is that SMART is not to be trusted and the repeated spin-up, do stuff, spin down of TM on those drives is the cause. The other is that rarely you will in fact get a garbled data packet over USB where the CRC happens to match.

kevinchen · on June 12, 2014

But when I test the drives with badblocks, nothing shows up. The hardware is fine -- I write bytes and it gives them back to me when I read.

I suspect it's because Time Machine works by creating tons and tons of symlinks, which the HFS+ driver does not like. Regardless, it shouldn't be possible for a userspace application to corrupt the hard drive by creating a lot of links and folders.

TheCondor · on June 11, 2014

Yeah, it's a media problem. There are some missing and interesting pieces of information, was this a laptop? Does it get transferred while off or asleep? What did disk utility tell you about the drive?

baldfat · on June 11, 2014

HFS+ has had known file system issues and was suppose to be replaced by ZFS, but that crashed and burned. Apple has been working on NFS+ and adding features. HFS+ and NTFS are long in the tooth and really should have been replaced a long time ago.

mantrax5 · on June 11, 2014

Can we please stop using lottery/lightning/meteor analogies?

A modern computer does more operations in a single second, than a small country can buy lottery tickets in a lifetime.

Many wildly unlikely things become quite likely with computers.

myrandomcomment · on June 11, 2014

Just ordered this:

http://www.ixsystems.com/storage/freenas/

ZFS to stop the bit rot!

I found the same issues on my music, video and photos collections.

ajtaylor · on June 11, 2014

I've been using an HP MicroServer (the N54L to be exact) + FreeNAS with great success. My outlay was <$300 since I used disks I already had lying around and the performance is much better than the cheap home-grade NAS products.

PhantomGremlin · on June 11, 2014

I think it's indisputable that ZFS stops "bit rot". It's an excellent tool for that.

However, as a casual observer and long time ZFS wanna-be, I've noticed that the following two issues haven't really gone away:

1) Which version of ZFS? After Sun was assimilated by the Borg and after the great ZFS developer diaspora, everyone seems to have their fingers in the ZFS development pie. There are a plethora of derivative versions. Nothing wrong with that, but when there was a single canonical version there were "many eyes" on it. Bugs were hunted down and squashed. But now? Every "port" to a different OS variant introduces new opportunities for bugs, doesn't it? Is ZFS on FreeBSD as stable as on Solaris? And FreeNAS isn't pure FreeBSD, have they tweaked ZFS?

2) When ZFS is working, it's great. But it doesn't seem to simply fray around the edges. When it fails, it fails catastrophically. I've seen much advice on mailing lists to "restore from backup" after something goes wrong.

myrandomcomment · on June 12, 2014

Well in the case of what I am buying I am using the ZFS version that comes with FreeNAS which is based on FreeBSD. Everything can fail at some point. So I also have off site backup. I will have 4x4TB in this setup and only need 4GB of space so I think I will do the ZFS version of RAID10.

PhantomGremlin · on June 12, 2014

In retrospect, my post, while making valid points, was a bit of a Negative Nancy whine.

I'm glad you pointed out this low-cost ZFS appliance for home and SOHO use. It's not like when Sun sold ZFS boxes with 45 disks. Instead it's low cost, directly competitive with Synology and HP MicroServer boxes.

As Marta Stewart would say: "It's a good thing".

ansible · on June 11, 2014

Note that ZFS can detect the bit rot, but you'll need to actually run some kind of RAID (in this case RAID-Z) to have it fixed automatically. And if you do want bit rot detected on a timely basis, you'll need to have the data on the drives scrubbed on a regular basis (weekly? monthly?, not sure what is sufficient).

Freaky · on June 11, 2014

You can set the "copies" property to 2 or 3 even with a single disk, so if one dodgy sector blats one of your files, ZFS can still recover: https://blogs.oracle.com/relling/entry/zfs_copies_and_data_p...

ansible · on June 11, 2014

I didn't know that. Thanks.

therealmarv · on June 11, 2014

Ars Technica also has a in deep review of HFS+. It seems HFS+ deeply trusts the hardware in finding errors on disk: http://arstechnica.com/apple/2011/07/mac-os-x-10-7/12/ BUT all normal filesystems like NTFS, Ext2-3 are no difference there.

billyhoffman · on June 11, 2014

Somewhere, John Siracusa's spider sense just twitched

mwfunk · on June 11, 2014

If he ends up being satisfied by Swift, I wonder what happens to his Copland 2010 angst? Does it go away, or does it bandwagon with the filesystem angst? My brain seems to follow a law of Conservation of Pessimism; it just gets redistributed when one of the sources goes away. He may be the same way. Regardless, his cause is true and his dedication to it is admirable.

zw · on June 11, 2014

If a table bell rings to signify such an event and it isn't recorded on a podcast, was it ever actually rung?

rcthompson · on June 11, 2014

So, what is a better filesystem that I can use with OS X for storing my data? EXT3 via FUSE driver? NTFS via NTFS-3G driver? ZFS via whichever ZFS-on-OSX fork is the current one for this month? FAT? ExFAT?

Or just back it up daily over the network to a Linux server in my closet?

therealmarv · on June 11, 2014

I agree. What is the alternative? Are the 26 files silently bad or does OS X say about this? Is Ext2-4 better than HFS+, is NTFS better? For sure ZFS is better... but is any of this standard file systems better (NTFS, EXT)? Is there any research? Maybe he would also get 26 bad files with NTFS and EXT3. This whole article is blurry for me.

lallysingh · on June 11, 2014

The article described how the photos decoded wrong. Sounds like OS X left them alone without noticing the problem. Instead, the user's attempt to read the files personally (when else do you decide?) detected the errors.

justincormack · on June 11, 2014

Use a FreeBSD based NAS with ZFS I guess is the best option. Not sure daily backups will help if it is being corrupted, as you will overwrite the good data at some point.

atmosx · on June 11, 2014

> HFS+ lost a total of 28 files over the course of 6 years.

That's a good number for average users. No one is using HFS+ as a file server. Users who have lots of data use external backup devices (which are prone to HW failure, especially the WD external HDs) or oversight backup services.

ps. Anyone used this[1] on macosx? Can it replace HFS+ in the root partition?

UPDATE: The faq clearly states that it can't be used as a root partition:

Q) Can I boot my computer off of O3X?[edit] A) No. O3X cannot be used as your main system partition.

what a pitty :-(

[1] https://openzfsonosx.org/

klapinat0r · on June 11, 2014

I've used it and still am, in somewhat of an experiment, but with great results:

I used to work with ZFS for a living, so I want it to work - maybe that makes me biased.

I wrote an article on the wiki[0] on CoreStorage and Encryption together with ZFS and it's been working as expected for a couple of months now.

I currently use it to test family/friends backup with SyncThing[1] to see if it can make a (although bit hacky) viable backup solution for the common man, with file history based on routinely snapshots (which will make problems, e.g. how Dropbox explicitly doesn't sync open word documents, how a VMs disk image might not be super great to backup "as is" while in use).

As a final note: it can not replace HFS+ for TimeMachine backup either.

[0]: https://openzfsonosx.org/wiki/Encryption

[1]: http://syncthing.net/

atmosx · on June 11, 2014

Great, thanks for the heads up. Do you think that there is any hope to use ZFS as a full-featured replacement in the future for OSX?

klapinat0r · on June 11, 2014

I'm not knowledgable enough in that area, but as far as I know the OS X root needs to be HFS.

I can only answer by similarity: to boot from Btrfs in ArchLinux you'll need to load kernel modules in the bootloader.

Two reason:

1. btrfs isn't "built in" and needs to be loaded.

2. the boot loader needs to be able to read btrfs (obvious, but mentioning for the sake of mentioning)

That's doable for btrfs/Arch because we can put our own bootloader with btrfs compatability in.

Which leads me to my speculative answer:

No. It might be possible with a third party bootloader (iBoot comes to mind), but to have Apple adopt the OpenZFS implementation, or to revive their own, seems unlikely.

My guess would be that they are working on something, mainly because they have to find an alternative. Other people have talked about this being a necessary next step for Mac for years, but it does seem plausible that its in the works.

Replacement for HFS+ - ZFS? Probably not. Something else? Maybe WWDC 2015-2016.

EDIT: In case you mean full-featured as in "ZFS full featured", then yes, that's already there. Not Oracle full-featured, but ZoL (ZFS on Linux port) and somewhat IlluminOS compatible.

huxley · on June 11, 2014

Dominic Giampaolo who did a lot of the work on the BeOS File system has been working for Apple since around 2002:

http://www.nobius.org/~dbg/

therealmarv · on June 11, 2014

http://www.idt.mdh.se/kurser/ct3340/ht09/ADMINISTRATION/IRCS... Page 8 and 9 of this analysis of file systems. NTFS and Ext3 would not be better. The problem the author describes is more and more a hardware problem. No standard filesystem will automatically repair bad blocks.

jsz0 · on June 11, 2014

HFS+ is old an ugly but in practical terms it's good enough. Obviously you cannot rely on any single file system no matter how many mirror/parity disks you throw at it. You need at least one local backup on an independent file system / disks and ideally also an offsite backup. The odds of HFS+ corruption of the same file 3 different file systems are incredibly low.

lallysingh · on June 11, 2014

Backing up a ZFS volume would have detected the corruption error, giving you a chance to (a) not make a backup of the corrupted version and (b) restore a good version from disk.

In the same situation, HFS+ would let you make a backup of corrupted data. If you don't keep all your historical backups, you may end up unwittingly tossing out the last backups with good versions of those files.

I'm sorry, I don't understand your last sentence.

XorNot · on June 11, 2014

If you're backing up without versions, you're inviting disaster.

zurn · on June 11, 2014

> Modern file systems like ZFS, which Apple considered but abandoned as a replacement, include checksums of all meta data structures. That means that when the file is accessed, the filesystem detects the corruption and throws an error.

This is exactly backwards - metadata checksums don't protect file contents. They just cover the integrity of the FS so when the FS internals are corrupted, it knows not to write to random places on the disk and can with redudancy can try to recover the metadata.

alcari · on June 11, 2014

The quote is wrong: ZFS checksums both the user data and metadata in a Merkle tree, optionally using cryptographic hashes (i.e. SHA256).