
HFS+ Bit Rot - adamzegelin
http://blog.barthe.ph/2014/06/10/hfs-plus-bit-rot/
======
chuckup
I'm surprised how little is done to protect data in modern systems. The two
big things we should be doing - ECC ram and checksumming filesystems - are
still nowhere close to being mainstream.

I recently bought and returned three(!) 3TB drives because I found they were
silently corrupting data every 500GB of writes, or so (verified by testing on
multiple systems) - I switched to 2TB from another brand, and had zero issues.
I only knew there was a problem because I wrote a burn-in script to copy &
verify over and over. File system, OS does not care. It's the drive's job.

Almost every USB stick I've ever used eventually had some small corruption
issue - again, no way to catch this unless you are looking for this kind of
thing.

Average consumer does not think this is even possible - the idea of your data
silently getting scrambled seems impossible, like a car randomly being unable
to break - it is just assumed this sort of thing does not happen. But as hard
drives get bigger and people put more RAM in their systems I think this will
become a huge issue. Of course, consumers will blame "a virus" or something
along those lines.

~~~
binarycrusader
It doesn't help that Intel has artificially limited ECC support to specific
processors. As the amount of memory in PC increases and the frequency at which
components operate, the probability of memory errors increases, yet Intel
still foolishly refuses to support ECC in their consumer line selling it as a
"server" feature and limiting it to Xeons.

~~~
theandrewbailey
Perhaps AMD could enable ECC on their consumer CPUs?

~~~
binarycrusader
AMD does for their Athlon FX processors, but support is mostly dependent on
the motherboard. With that said, most of the AMD motherboards for AM3+ support
it, and it's limited to unbuffered ECC.

Unfortunately, that means choosing ECC over power and performance vs. Intel.
It's really a sad state of affairs.

------
nandhp
> HFS+ is seriously old

Sure, in computing terms 1998 is an eon ago. But that's not a good reason to
stop using a file system. Lots of older file systems are still in use: FAT32
(1996; still the default for SD cards <32GB and anytime you need a cross-
platform filesystem), XFS (1994; the new default filesystem for RHEL7), NTFS
(July 1993; WinFS still hasn't materialized), and ext2 (January 1993; still
commonly used, particularly in situations where a Journal is not required).

Of course, I'm perfectly happy to believe that HFS+ is more badly-designed
than any of the other filesystems from that time. But a filesystem doesn't
need to be replaced just because it's no longer trendy.

~~~
MBCook
Here's the thing. I used FAT16 for years and years and years and never had a
problem. I used FAT32 for the majority of a decade and never had a problem. I
used NTFS for a number of years and never had a problem.

While NFS+ isn't quite as bad right now, I can tell you that the 10.4-10.5
days it seem to enjoy corrupting files. You rebooted your computer? Here's a
couple of missing files in your trash. Hope you can find what binary thing on
your multi hundred gigabyte disk they fit in.

NFS+ is the ONLY filesystem I have ever used that I have lost bits of data on
from silent corruption. I don't trust it. I would kill for Apple to adopt ZFS
or NTFS. At this point I want it check summing all of my files and my
filesystem so I know that it's not being corrupted silently by the OS making
weird mistakes on a filesystem that should've been replaced 10 years ago and
is just hack after hack on a system designed for a computer that only had 128
K of memory.

I love everything about OS X. Except HFS plus, which needs to die in a fire.

Nothing is quite as much fun as looking at an old picture or listening to a
song I haven't listened to in quite a while just to find that it's actually
silently corrupted and has been for multiple years and I probably don't have
the correct version on a backup.

I'd trust ext2 over HFS+.

Yes, I'm bitter.

~~~
iSnow
>I used FAT16 for years and years and years and never had a problem. I used
FAT32 for the majority of a decade and never had a problem

>I'd trust ext2 over HFS+.

>Yes, I'm bitter.

Unfortunately not only bitter but bitter to the point of senseless argument
just of the sake of argument. Trusting a non-journaled FS over a journaled
without error correction/detection is a bad choice, like career-limiting bad.
And if you never had problems with FATXX then you most likely never used DOS
or old Windows seriously. Corrupted data and specifically corrupted MBR was
really frequent back then (like corrupted b-trees on the Mac side)

~~~
Dylan16807
Most journals are metadata-only, so they save you from corrupt directories but
don't protect contents at all. NTFS loves to wipe out just-written files when
I lose power, for example. Files that previously existed, mind you, now
suddenly garbage.

------
userbinator
Hard drives (of both the magnetic and flash-based variety) all have built-in
error detection and correction. If you are getting corrupt files that's not
the filesystem's fault, it's most likely a problem with the hardware.

Checksums at the FS level are _very_ rare; the majority of the ones in use
don't have them
([http://en.wikipedia.org/wiki/Comparison_of_file_systems](http://en.wikipedia.org/wiki/Comparison_of_file_systems)
) and yet they function perfectly fine. HFS+ is not the problem here.

~~~
rcthompson
Are you saying that hard drives store everything on disk with checksums or
redundancy? That would be news to me. How else could they correct (or even
detect) errors?

Anyway, are you arguing that filesystems shouldn't bother with checksumming at
all?

~~~
kevinday
Hard drives store things with varying forms of ECC. Each sector has an ECC
field, allowing it to detect many errors and automatically correct some.

This isn't a replacement for something better, it just allows simple bit
errors to be corrected automatically by the drive. The problem is that it's
not really obvious when it's happening, and you only notice when it can't fix
something. Drives eventually throw a SMART error when it's had to do too many
corrections though.

~~~
userbinator
Correctable errors are logged by SMART - see the Raw Read Error Rate, Seek
Error Rate, Hardware ECC Recovered, and related attributes.

------
ksec
The reason why ZFS didn't plan out were simply because it was using too much
CPU and Memory. Not something worth considering when the majority of Apple's
devices now are Mobile ( Phone / Tablet / Laptop )

It may have been great if Time Capsule, or an Apple NAS uses ZFS. But it seems
Apple will likely wants you to move everything to iCloud Drive( Finally! ).

I think Apple's new FileSystem will be based entirely for Flash. Something
similar to Samsung's F2FS. Since F2FS is GPLv2 license it is not possible for
Apple to use it within their own Kernel.

~~~
danieldk
_The reason why ZFS didn 't plan out were simply because it was using too much
CPU and Memory._

I am not sure that is the reason. Apple did already announce it at WWDC after
all. ZFS on OS X was announced in June 2007. In September 2007, NetApp sued
Sun over patents violations in ZFS.

It's likely that Apple didn't want a patent suit after adopting ZFS as their
main file system. 2007 Apple was of a completely different size as 2014 Apple.

~~~
msbarnett
They announced support for it, not that it was the new default. Big
difference.

ZFS _support_ made sense for Mac OS X Server back in 2007. It's a beefy server
filesystem that rewards beefy servers with gobs of ECC RAM, RAID, and no
battery to worry about.

ZFS as default replacement for all of Apple's HFS+ usecases (laptops, iPods,
phones and tablets in the works) made no sense in 2007 and makes no sense in
2014. ZFS is simply too resource intensive and too dependent on ECC RAM even
now for consumer use cases.

------
hosay123
I don't know the numbers, but the probability of getting 26 corrupted at-rest
files through natural causes sounds pretty much like winning the lottery twice
on the same day you were struck by lightning twice

Checksums wouldn't have fixed this, they'd only alert the user to the fact the
damage had already been done, which is exactly what the decompressor did in
its own special way.

As another comment points out, error correcting codes are the way to handle
this, and its already done in hardware, and probably too expensive to do in
software in the general case

~~~
kevinchen
I can only give you anecdotal evidence, but hopefully it convinces you that
hfs+ is the problem.

I've been using Macs for 5 years now. I've stored time machine backups
spanning two drives and OS releases from snow leopard to mavericks. (Each
drive was used for about 2.5 years, and the OSes were installed soon after
they became available.)

Every 9-15 months, I ask Disk Utility to repair the Time Machine volume. The
first time, I lost the entire TM directory and had to format the drive and
start over. The second time, it crashed Disk Utility because there were so
many errors. The third time, I lost the older ~70% of my backups. The most
recent time, the operating system _refused to even recognize the disk as an
HFS+ volume!_

Each time, I had the opportunity to format the disk in question and check it
for hardware issues. They all passed badblocks and SMART with flying colors.
In addition, the volumes were unmounted properly the vast majority of the time
(a few instances of human error and power outage)

tldr: HFS+ loses your data catastrophically even if you use it properly.

~~~
aroch
On the other side of the anecdotal coin, I've been running OSX for a decade
now and have in the neighborhood of 100TB of HFS+ formatted media, some in
RAID some as single drives. The few times I've catastrophically lost
data/drives its been due to the drives physically failing or, in the case of
an SSD, a firmware bug. I've lost data to the filesystem (bitrot) but that
wouldn't really be prevented by any other FS, they'd just tell me sooner.

By the same token, I've experienced much more corruption on my ZFS (~200TB)
and btrfs (58TB) arrays over the last few years than I have on HFS for a
decade.

~~~
Tloewald
@Alupis -- um HFS+ is journaled, so I'm not really sure what your point is.

------
myrandomcomment
Just ordered this:

[http://www.ixsystems.com/storage/freenas/](http://www.ixsystems.com/storage/freenas/)

ZFS to stop the bit rot!

I found the same issues on my music, video and photos collections.

~~~
PhantomGremlin
I think it's indisputable that ZFS stops "bit rot". It's an excellent tool for
that.

However, as a casual observer and long time ZFS wanna-be, I've noticed that
the following two issues haven't really gone away:

1) Which version of ZFS? After Sun was assimilated by the Borg and after the
great ZFS developer diaspora, everyone seems to have their fingers in the ZFS
development pie. There are a plethora of derivative versions. Nothing wrong
with that, but when there was a single _canonical_ version there were "many
eyes" on it. Bugs were hunted down and squashed. But now? Every "port" to a
different OS variant introduces new opportunities for bugs, doesn't it? Is ZFS
on FreeBSD as stable as on Solaris? And FreeNAS isn't pure FreeBSD, have they
tweaked ZFS?

2) When ZFS is working, it's great. But it doesn't seem to simply fray around
the edges. When it fails, it fails catastrophically. I've seen much advice on
mailing lists to "restore from backup" after something goes wrong.

~~~
myrandomcomment
Well in the case of what I am buying I am using the ZFS version that comes
with FreeNAS which is based on FreeBSD. Everything can fail at some point. So
I also have off site backup. I will have 4x4TB in this setup and only need 4GB
of space so I think I will do the ZFS version of RAID10.

~~~
PhantomGremlin
In retrospect, my post, while making valid points, was a bit of a Negative
Nancy whine.

I'm glad you pointed out this low-cost ZFS appliance for home and SOHO use.
It's not like when Sun sold ZFS boxes with 45 disks. Instead it's low cost,
directly competitive with Synology and HP MicroServer boxes.

As Marta Stewart would say: "It's a good thing".

------
therealmarv
Ars Technica also has a in deep review of HFS+. It seems HFS+ deeply trusts
the hardware in finding errors on disk:
[http://arstechnica.com/apple/2011/07/mac-
os-x-10-7/12/](http://arstechnica.com/apple/2011/07/mac-os-x-10-7/12/) BUT all
normal filesystems like NTFS, Ext2-3 are no difference there.

------
billyhoffman
Somewhere, John Siracusa's spider sense just twitched

~~~
mwfunk
If he ends up being satisfied by Swift, I wonder what happens to his Copland
2010 angst? Does it go away, or does it bandwagon with the filesystem angst?
My brain seems to follow a law of Conservation of Pessimism; it just gets
redistributed when one of the sources goes away. He may be the same way.
Regardless, his cause is true and his dedication to it is admirable.

------
rcthompson
So, what is a better filesystem that I can use with OS X for storing my data?
EXT3 via FUSE driver? NTFS via NTFS-3G driver? ZFS via whichever ZFS-on-OSX
fork is the current one for this month? FAT? ExFAT?

Or just back it up daily over the network to a Linux server in my closet?

~~~
therealmarv
I agree. What is the alternative? Are the 26 files silently bad or does OS X
say about this? Is Ext2-4 better than HFS+, is NTFS better? For sure ZFS is
better... but is any of this standard file systems better (NTFS, EXT)? Is
there any research? Maybe he would also get 26 bad files with NTFS and EXT3.
This whole article is blurry for me.

~~~
lallysingh
The article described how the photos decoded wrong. Sounds like OS X left them
alone without noticing the problem. Instead, the user's attempt to read the
files personally (when else do you decide?) detected the errors.

------
atmosx
> HFS+ lost a total of 28 files over the course of 6 years.

That's a good number for average users. No one is using HFS+ as a file server.
Users who have lots of data use external backup devices (which are prone to HW
failure, especially the WD external HDs) or oversight backup services.

ps. Anyone used this[1] on macosx? Can it replace HFS+ in the root partition?

UPDATE: The faq clearly states that it can't be used as a root partition:

Q) Can I boot my computer off of O3X?[edit] A) No. O3X cannot be used as your
main system partition.

what a pitty :-(

[1] [https://openzfsonosx.org/](https://openzfsonosx.org/)

~~~
klapinat0r
I've used it and still am, in somewhat of an experiment, but with great
results:

I used to work with ZFS for a living, so I want it to work - maybe that makes
me biased.

I wrote an article on the wiki[0] on CoreStorage and Encryption together with
ZFS and it's been working as expected for a couple of months now.

I currently use it to test family/friends backup with SyncThing[1] to see if
it can make a (although bit hacky) viable backup solution for the common man,
with file history based on routinely snapshots (which _will_ make problems,
e.g. how Dropbox explicitly doesn't sync open word documents, how a VMs disk
image might not be super great to backup "as is" while in use).

As a final note: it can not replace HFS+ for TimeMachine backup either.

[0]:
[https://openzfsonosx.org/wiki/Encryption](https://openzfsonosx.org/wiki/Encryption)

[1]: [http://syncthing.net/](http://syncthing.net/)

~~~
atmosx
Great, thanks for the heads up. Do you think that there is any hope to use ZFS
as a full-featured replacement in the future for OSX?

~~~
klapinat0r
I'm not knowledgable enough in that area, but as far as I know the OS X root
needs to be HFS.

I can only answer by similarity: to boot from Btrfs in ArchLinux you'll need
to load kernel modules in the bootloader.

Two reason:

1\. btrfs isn't "built in" and needs to be loaded.

2\. the boot loader needs to be able to read btrfs (obvious, but mentioning
for the sake of mentioning)

That's doable for btrfs/Arch because we can put our own bootloader with btrfs
compatability in.

Which leads me to my _speculative_ answer:

 _No_. It might be possible with a third party bootloader (iBoot comes to
mind), but to have Apple adopt the OpenZFS implementation, or to revive their
own, seems unlikely.

My guess would be that they are working on something, mainly because they
_have_ to find an alternative. Other people have talked about this being a
necessary next step for Mac for years, but it does seem plausible that its in
the works.

Replacement for HFS+ - ZFS? Probably not. Something else? Maybe WWDC
2015-2016.

 _EDIT_ : In case you mean full-featured as in "ZFS full featured", then yes,
that's already there. Not Oracle full-featured, but ZoL (ZFS on Linux port)
and somewhat IlluminOS compatible.

------
huxley
Dominic Giampaolo who did a lot of the work on the BeOS File system has been
working for Apple since around 2002:

[http://www.nobius.org/~dbg/](http://www.nobius.org/~dbg/)

------
therealmarv
[http://www.idt.mdh.se/kurser/ct3340/ht09/ADMINISTRATION/IRCS...](http://www.idt.mdh.se/kurser/ct3340/ht09/ADMINISTRATION/IRCSE09-submissions/ircse09_submission_16.pdf)
Page 8 and 9 of this analysis of file systems. NTFS and Ext3 would not be
better. The problem the author describes is more and more a hardware problem.
No standard filesystem will automatically repair bad blocks.

------
jsz0
HFS+ is old an ugly but in practical terms it's good enough. Obviously you
cannot rely on any single file system no matter how many mirror/parity disks
you throw at it. You need at least one local backup on an independent file
system / disks and ideally also an offsite backup. The odds of HFS+ corruption
of the same file 3 different file systems are incredibly low.

~~~
lallysingh
Backing up a ZFS volume would have detected the corruption error, giving you a
chance to (a) not make a backup of the corrupted version and (b) restore a
good version from disk.

In the same situation, HFS+ would let you make a backup of corrupted data. If
you don't keep all your historical backups, you may end up unwittingly tossing
out the last backups with good versions of those files.

I'm sorry, I don't understand your last sentence.

~~~
XorNot
If you're backing up without versions, you're inviting disaster.

------
zurn
> Modern file systems like ZFS, which Apple considered but abandoned as a
> replacement, include checksums of all meta data structures. That means that
> when the file is accessed, the filesystem detects the corruption and throws
> an error.

This is exactly backwards - metadata checksums don't protect file contents.
They just cover the integrity of the FS so when the FS internals are
corrupted, it knows not to write to random places on the disk and can with
redudancy can try to recover the metadata.

~~~
alcari
The quote is wrong: ZFS checksums both the user data and metadata in a Merkle
tree, optionally using cryptographic hashes (i.e. SHA256).

