
Linus Torvalds on HFS+ - kannonboy
https://plus.google.com/+JunioCHamano/posts/1Bpaj3e3Rru
======
kennywinker
While I really like thristian's explanation of why case insensitivity adds
massive complexity to a system:
([https://news.ycombinator.com/item?id=8876873](https://news.ycombinator.com/item?id=8876873))

I have to point out that case sensitivity offloads a bunch of that complexity
to the user. This is almost definitely why OS X uses case-insensitive HFS+ by
default.

As a ultra-simple example: with a case-sensitive file system I can have two
directories, "documents" and "Documents". Which one are my documents in? Half
in one, half in the other, probably.

I'm not saying Linus at al. are _wrong_ that case-sensitive is the way to go,
but there are some reasonable arguments for trying to take that complexity
back from the user and deal with it in the file system.

Source: I'm a long time mac user, and I just switched to a case-sensitive file
system last year. The win of having my dev machine match my deployment
environment (iOS) is bigger than any downsides I've seen yet.

~~~
beedogs
> As a ultra-simple example: with a case-sensitive file system I can have two
> directories, "documents" and "Documents". Which one are my documents in?
> Half in one, half in the other, probably.

That only really happens if you're an idiot who just haphazardly throws
documents into any directory that looks like it can still hold files. And it's
not an indictment of case sensitivity; you can achieve the same sort of
stupidity in plenty of other ways if you're determined to do so.

~~~
tedunangst
Idiot here. I like to save downloaded files in a directory called "downloads".
Firefox has decided it would prefer to save them in a directory called
"Downloads". I can tell Firefox to save them in the downloads directory, but
sooner or later, it inevitably decides it's tired of that and goes back to
saving them in Downloads.

~~~
restalis
That's just a problem with the application (and it just happens for that
problem to be hidden by a case-insensitive file-system). For a solution, if
you can't bend your application to your will, bend yourself to it and rename
your "downloads" to whatever Firefox likes.

------
justizin
Several years ago, when I'd first moved to SF, I got a scholarship to WWDC for
working for ACM, and I took advantage of the opportunity to go to a Birds-of-
Feather for people interested in "Darwin filesystems" or something like that.
It was basically a little conference room with the FS team, and I asked, flat-
out, "Why case insensitive?"

And they answered, flat-out: "Microsoft Office"

Even this past week, I was talking with coworkers about having trouble with
nothing but Valve Steam on my Macs that have a case-sensitive filesystem[0].
That's particularly odd, since it works on Linux now, but that's another
matter.

What I found most notable about this thread, is this quote from Linus:

"And Apple let these monkeys work on their filesystem? Seriously?"

I'm pretty sure Apple actually _fired_ anyone who wanted any of the things
done anything close to any way that Linus Torvalds would agree with.

ext* not being particularly perfect, I'm happy to have both. I mean, ext2 is
hard to complain about, but it comes from an era where basically all
filesystems were terrible, literally the era when SGI started installing
backup batteries to race with fsync().

ext4 has an alarming number of corruption bugs, but I'm sure it's not because
of insane unicode handling, though I take Linus' description of how the OSX
filesystem works with a grain of salt. He can't possibly _care_ to know as
much about it as he knows about Linux's.

[0] achievable by formatting HFS+X in Disk Utility in Recovery Mode, then
installing onto that drive

~~~
phaemon
> ext4 has an alarming number of corruption bugs

Which bugs are these?

> though I take Linus' description of how the OSX filesystem works with a
> grain of salt. He can't possibly _care_ to know as much about it as he knows
> about Linux's.

[http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.g...](http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/fs/hfsplus)

~~~
justizin
"Which bugs are these?"

My favorite is, at least a couple years back, if you had a KVM guest with an
ext4 filesystem in an image file, on an ext4 filesystem, the guest OS could
corrupt the host OS.

#winning

Obviously, ext4 is largely a very good FS, but at least twenty people I've
managed hundreds of servers with agree that if you do not need huge
filesystems, huge files, or directories with tons of files, ext3 is a safer
choice.

I'm not saying Linus is incompetent, just that his criticism of other
filesystems shouldn't be processed as if his work has no faults.

One thing I wonder is, is it the filesystem, or the C library, that determines
things like how unicode is interpreted in paths - his overwhelming rant focus.

~~~
phaemon
> at least a couple years back, if you had a KVM guest with an ext4 filesystem
> in an image file, on an ext4 filesystem, the guest OS could corrupt the host
> OS.

Your "alarming number of corruption bugs" is one bug from a couple of years
back?

ext4 is fairly well established now. A few years ago it might have been new
enough that there were edge cases that needed investigation, but it's robust
enough now.

And there's a difference between a decent design having some bugs that can be
fixed and a fundamentally broken design. Linus seems to be arguing that HFS+
is the latter.

The problem with HFS+ is precisely that it needs to handle unicode in the
filesystem because it needs to consider different names as the same. And,
seemingly, it does it even worse than NFTS does.

------
Someone1234
Can someone explain to me why case insensitivity is a bad thing? Clearly Linus
believes so but didn't explain why he believes so.

Most UNIX and Linux systems seem to have an "all lowercase" or "all uppercase"
convention, so the fact that they have case sensitivity is often not utilised.

In fact the biggest reason you'd want case sensitivity off the top of my head
is legacy support but that's just a circular argument (since you never really
reach WHY it was that way originally, just that it was).

I guess based on what he talks about next he is worried about how case
insensitivity interacts with other character sets (i.e. does it correctly
change their case), but for most sets isn't the lower and upper case defined
in the UNICODE language spec itself?

~~~
thristian
The prime number-one concern in kernel programming is managing complexity.
Well, in most programming really, but in kernel programming unmanaged
complexity leads to lost data and sometimes broken hardware instead of "just"
crashes.

Case-sensitivity is the easiest thing - you take a bytestring from userspace,
you search for it exactly in the filesystem. Difficult to get wrong.

Case-insensitivity for ASCII is slightly more complex - thanks to the clever
people who designed ASCII, you can convert lower-case to upper-case by
clearing a single bit. You don't want to _always_ clear that bit, or else
you'd get weirdness like "`" being the lowercase form of "@", so there's a
couple of corner-cases to check.

Case-sensitivity for Unicode is a giant mud-ball by comparison. There's no
simple bit flip to apply, just a 66KB table of mappings[1] you have to hard-
code. And that's not all! Changing the case of a Unicode string can change its
length (ß -> SS), sometimes lower -> upper -> lower is not a round-trip
conversion (ß -> SS -> ss), and some case-folding rules depend on locale (In
Turkish, uppercase LATIN SMALL LETTER I is LATIN CAPITAL LETTER I WITH DOT
ABOVE, not LATIN CAPITAL LETTER I like it is in ASCII). Oh, and since Unicode
requires that LATIN SMALL LETTER E + COMBINING ACUTE ACCENT should be treated
the same way as LATIN SMALL LETTER E WITH ACUTE, you also need to bring in the
Unicode normalisation tables too. And keep them up-to-date with each new
release of Unicode.

So the last thing a kernel developer wants is Unicode support in a filesystem.

[1]:
[http://www.unicode.org/Public/UNIDATA/CaseFolding.txt](http://www.unicode.org/Public/UNIDATA/CaseFolding.txt)

~~~
Someone
_" And keep them up-to-date with each new release of Unicode"_

I agree with the rest, but that is one thing you absolutely shouldn't do. Once
there are disks 'out there' that were created with some idea about what case
insensitivity is, your choice has been set in stone. The risk of dhanging any
rule is simply too high. Somebody might start reading hat disk using the
previous version of the file system.

For example, the precursor to HFS+, HFS, like HFS+, kept directories sorted by
name. However, it had a filename sorting bug that the Finder had to work
around (see
[http://dubeiko.com/development/FileSystems/HFSPLUS/tn1150.ht...](http://dubeiko.com/development/FileSystems/HFSPLUS/tn1150.html#UnicodeSubtleties))

~~~
ori_b
> _I agree with the rest, but that is one thing you absolutely shouldn 't do._

And now you have a file system that is case sensitive, BUT ONLY IN SOME
UNICODE RANGES. Somehow, that's better than just not being case sensitive in
the first place?

------
terminus
Also a longer critique by John Siracusa here:
[http://arstechnica.com/apple/2011/07/mac-
os-x-10-7/12/#hfs-p...](http://arstechnica.com/apple/2011/07/mac-
os-x-10-7/12/#hfs-problems)

I wonder how this ties in with the whole Apple philosophy of "Design is how it
works."

Clearly the innards look nothing like the facade.

~~~
Someone1234
> File system metadata structures in HFS+ have global locks. Only one process
> can update the file system at a time.

Holy heck! How does that work in practice? Do operations get queued and then a
single kernel process who can take the lock do the atomic updates?

I have to imagine that is going to cause a bottleneck however, as all non-read
operations need to update the metadata (e.g. timestamp, maybe size if it is
stored).

That all being said I haven't noticed OS X being particularly slower to do
things than e.g. Windows. So if that is the case they're hiding it well.

~~~
mindajar
HFS+ is an oddball filesystem among FSes that anyone actually uses. Since the
earliest days of the Mac, it's been able to e.g. track file identity even as a
file is moved around, and the metadata to support this lives in a volume-wide
tree called the Catalog File. So, there's your global lock. (Add an SSD as
necessary for better performance.)

OS X already has some pretty high-level file and metadata APIs not found on
other systems, so maybe Apple's future plans don't look like a traditional
Unix file system at all. They've already demonstrated they know how to make a
very weird, non-Unix filesystem look like one. ;)

------
kannonboy
Scroll down for Linus' commentary.

> "The true horrors of HFS+ are not in how it's not a great

> filesystem, but in how it's actively designed to be a bad

> filesystem by people who thought they had good ideas."

There doesn't seem to be a way to deep link to comments in G+?

~~~
astrodust
G+ is a train wreck for a number of reasons, this included.

Yeah, he's not a fan of HFS+ at all. Wasn't the plan to move to ZFS prior to
the Oracle acquisition of Sun? Hopefully that ends up back on track somehow.

~~~
donavanm
Project predated oracle, back in the osx 10.5-10.6 days when there was the
dtrace integration as well. There were internal builds of the zfs kext and at
one point I saw leaked source on the Internet. Project was abandoned because
of legals disagreements over cddl licensing, as I recall.

~~~
the_why_of_y
CDDL licensing? more likely NetApp scared Apple off with patent aggression

[http://en.swpat.org/wiki/NetApp%27s_filesystem_patents](http://en.swpat.org/wiki/NetApp%27s_filesystem_patents)

------
AceJohnny2
I'm rather amused (and, as always, a bit saddened) by the comment of Terry A.
Davis, of TempleOS fame/notoriety.

"linux doesn't have to search parent directories for file-not-found, but I
do."

wat

Edit: further parsing reveals he implemented a (read-only) overlay system in
his FS. Interesting, I wonder what the side-effects (vulns) could be?

~~~
iso8859-1
The word "vulnerability" doesn't make sense when talking about TempleOS, since
it does not even attempt to offer any kind of security.

------
wiremine
"Quite frankly, HFS+ is probably the worst filesystem ever."

Can anyone summarize why he thinks this?

~~~
wazoox
HFS+ has been patched with duct tape and pieces of cardboard for 20 years;
receiving journaling, support for Unix attributes, extended attributes, 64
bits sizing, multi-processing, multi-users, hard links, etc. over the years.
Really it's a sort of monument to kludge. It should have been ditched like 10
years ago.

~~~
verbatim
I'm not sure that's terribly different from ext4's lineage, which is generally
accepted (AFAIK) as a pretty good file system, even if not cutting-edge.

------
gchpaco
HFS+ is probably the worst filesystem in common use right now; even FAT has
the benefit of simplicity. Most of its issues, however, are with its horrific
implementation; the Unicode naming is kind of bad but Linus manages to be
wrong about several things.

Regarding case sensitivity: it is generally accepted among the user interface
crowd that (Western) users don't really understand that 'C' and 'c' are
different things; they're "both" 'c'. Case-preserving is thus the accepted
practice. However case manipulation is not an operation that can be done
absent a locale; my go to example here is that 'i' upcases to 'I' unless
you're a Turk in which case it upcases to 'İ'. Similar although not quite as
bad is the fact that 'ß' upcases to 'ẞ' U+1E9E in some exotic circumstances;
see
[http://en.wikipedia.org/wiki/Capital_ẞ](http://en.wikipedia.org/wiki/Capital_ẞ)
for details. Similar limitations apply to sorting, which users also expect.

Regarding Unicode: NFD is a normalization format; it converts 'é' U+00E9 and
'é' U+0065 U+0301--which are semantically identical--into the same coding. As
it happens NFD picks U+0065 U+0301 for that string; NFC picks U+00E9. Any time
there is ambiguity, NF[CD] will retain the original ordering. Calling it
"destroying user data" is meaningless histrionics. Most of the time we tend to
use NFC. I am told that NFD has certain advantages for sorting, where one
might want to match the French word 'être' with the search string 'etre'; in
NFC this requires large equivalence tables but in NFD the root character is
the same in both cases. Linus's claim 'Even the people who think normalization
is a good thing admit that NFD is a bad format, and certainly not for data
exchange.' has a big [citation needed] tag attached.

As it happens, my personal belief is the following: Given that users expect
case sensitivity and locale specific ordering, which complicate filesystem
design tremendously. Given that users mostly interact with the system through
GUI dialogs, which already hide system files (files with the hidden bit in
HFS+, or files starting with '.'). Therefore, extract the case sensitivity to
a layer, used by the GUI, which can understand the user's locale and so fold
case properly. This layer should be available to command line applications so
that they can use the same rules if they so choose. The underlying filesystem
will then be case insensitive, but is still used to encode Unicode data; the
right thing to do here is to normalize. Either NFC or NFD is fine, really.

For pedants: the related NFKC/NFKD forms add a canonicalization step and are
absolutely not semantically safe in any way, for all that they're useful for
sorting.

~~~
Someone
And then, you receive a zip file from Linus that has file.c, File.c and FILE.c
files on it. You extract it, and then? They either end up on disk, breaking
your case-insensitive UI layer (yes, you can see those files, but can you copy
them elsewhere?), or they don't, breaking the makefile that's also in the
archive. Here be dragons.

Locale-specific ordering of course _must_ be done outside the disk because
disks may move between systems with different locales, locales can be changed
at will, and multiple users could read the same directory with different
active locales (well, must: one could store a locale for sorting per directory
and force that on he user, but that is madness)

Also, reading
[http://dubeiko.com/development/FileSystems/HFSPLUS/tn1150.ht...](http://dubeiko.com/development/FileSystems/HFSPLUS/tn1150.html)
(which I can't find anymore on Apple.com), HFS+ doesn't quite use full NFD
because it sometimes destroys information that Apple deemed worth keeping.

~~~
tedunangst
Creating three files named file.c, File.c, and FILE.c seems like being user
hostile for the sake of being user hostile. Even for people using a case
sensitive filesystem, it's a dick move. Imagine talking about your project
over the phone. "Yeah, the problem is in file.c. No, not File.c, file.c.
Idiot! I clearly said file.c, not FILE.c!"

~~~
gchpaco
Yeah, I don't think that's a reasonable thing to do even if you can do it.

~~~
jacquesm
I've seen both Makefile and makefile in one archive.

~~~
tedunangst
Which is pretty ridiculous. Convention dictates that people use Makefile so
that's the file people read and edit, yet the make utility reads makefile
first by default. Including both is a great way to confuse users.

~~~
jacquesm
Yep. It definitely had that effect on me :)

------
niels_olson
1) What are the odds Tim Cook or Craig Federighi will here about this?

2) What are good filesystems for OS X to adopt? OS X supports other
filesystems. Is there a way to force it to install the OS onto a different
filesystem, like ext4?

~~~
mindajar
Re: 2), there are significant parts of OS X that seem to either check for HFS+
or rely on its implementation and bugs, and those things don't work properly
on other filesystems. OpenZFS, for instance, still doesn't work with
Spotlight, on which a surprising number of things depend these days.

[https://openzfsonosx.org/wiki/FAQ#Limitations](https://openzfsonosx.org/wiki/FAQ#Limitations)

If you want to use all the software features of your Mac, your only option is
HFS+.

~~~
knweiss
Soon it will support Spotlight. See
[https://openzfsonosx.org/wiki/Changelog#1.3.1-RC5](https://openzfsonosx.org/wiki/Changelog#1.3.1-RC5)

------
makecheck
I agree with Linus from a technical point of view but I think Apple had many
considerations here.

A number of games (and possibly other programs) on the App Store alone
specifically mention that they will _not_ work on Macs configured with case-
sensitive file systems. My guess is that this aids programmers who may have
ported something from Windows and not tested all possible file/path
dependencies.

This may also help users when copying files from Windows network disks or Mac
legacy systems where (from their point of view) they expect things to work.

------
lispm
I agree that HFS+ and its API is outdated.

Case insensitivity I find useful, OTOH.

------
jen729w
_Ding!_

------
yuhong
BTW, one other stupidity is that Linux's HFS+ implementation refuses to mount
journaled volumes when Apple designed it to be backward compatible.

~~~
ajross
How could that possibly work? I presume by "backward compatible" you mean that
the data outside the journal remains consistent, and the journal layer is
capable of detecting modifications made by non-journaled mounts.

That's... fine, I guess. It prevents the obvious corruption cases. But the
only plausible recovery mechanism after such a mount is to throw out the
journal! That's not likely to be acceptable to most users ("I booted to linux
and back, and now a bunch of new files disappeared!").

That's stretching the meaning of "compatible" too far.

~~~
yuhong
Pretty much. Look for lastModifiedVersion in:
[http://dubeiko.com/development/FileSystems/HFSPLUS/tn1150.ht...](http://dubeiko.com/development/FileSystems/HFSPLUS/tn1150.html)

~~~
ajross
Then I fail to see how this is a stupidity. Linux is doing the right thing:
there's no way for it to mount the filesystem without damage.

------
jeffehobbs
DING

------
datashovel
Thank God we have someone like Linus to keep BIG TECH in line. It's not all
about marketing Apple, Microsoft. It's about good design, and openness.

~~~
datashovel
I would expect far more downvotes than that! Come on what's wrong with you
"Hacker News". Only 1 downvote when I'm slammin' Apple for their shitty
strategy? Surely some of you have at least some downvotes available to use
against that!!!

~~~
datashovel
Even after a good night's sleep, I'm happy I decided to post this comment. The
people hiding behind their karma scores to downvote perfectly legitimate and
contextually relevant comments are one of the most frustrating and annoying
things about Hacker News.

