
Linus Torvalds: Apple's HFS+ is probably the worst file-system ever - arnieswap
http://www.linuxveda.com/2015/01/13/linus-torvalds-apples-hfs-probably-worst-file-system-ever/
======
forgottenpass
This is just blogspam of something that was popular on social media yesterday.
Which is where Linux Veda found it in the first place and proceeded to not add
any meaningful commentary or context.

It was already discussed to death here:
[https://news.ycombinator.com/item?id=8876319](https://news.ycombinator.com/item?id=8876319)

------
kevingadd
For reference, NTFS is case-sensitive. _Win32_ is case-insensitive. The case
sensitivity is implemented at a layer above the filesystem so that classic
Windows apps will behave the way users expect. (Case sensitive filesystems are
confusing for users and mostly create opportunities for horrible mishaps.) As
a result, the POSIX subsystem can store case sensitive files just like a linux
FS would, with 'FOO' and 'foo' being separate files right next to each other:

[http://support.microsoft.com/kb/100625](http://support.microsoft.com/kb/100625)

Case insensitivity like win32's is still kind of a mess, but it's unfair to
paint the developers of NTFS as idiots when they actually got this right.
Above the FS layer is the correct place to implement case insensitivity,
because case insensitivity is a feature that aids _users_ in _selecting_
files, not a feature that aids applications in opening them or creating them.

~~~
durin42
I don't know. Now you end up in a circumstance where you have N files named
some variant of "Foo" and the GUI layer can't distinguish them.

Enforcing this invariant at the filesystem layer (with well defined semantics,
which HFS+ at least documents semi-well) seems eminently reasonable to me.

------
babl-yc
As a developer I could see case-sensitive file systems as more "proper" and
predictable. But as an everyday Mac user, I'd be annoyed if I accidentally
selected "upload.png" instead of "Upload.png" and sent the wrong file.

~~~
userbinator
_But as an everyday Mac user, I 'd be annoyed if I accidentally selected
"upload.png" instead of "Upload.png" and sent the wrong file._

That assumes you somehow created two files with names differing only in case,
which I'd argue would happen only if you were naming your files with such
generic names. Also, I don't think the fact that two files' names _look_
similar should be any reason to treat them as the same - otherwise, would you
suggest that the letter O and the number 0 (along with the letter o) should be
considered identical for filenames?

~~~
FreezerburnV
You obviously don't understand how a lot of people actually use their
computer: [http://xkcd.com/1459/](http://xkcd.com/1459/)

(please note the tongue firmly planted in cheek, even if a lot of people
actually DO work like this. which is terrifying)

------
cremno
Another “funny” detail:

>The maximum representable date is February 6, 2040 at 06:28:15 GMT.

[http://dubeiko.com/development/FileSystems/HFSPLUS/tn1150.ht...](http://dubeiko.com/development/FileSystems/HFSPLUS/tn1150.html)

~~~
stygiansonic
Interesting: I didn't realize the "Epoch" in this case was "midnight, January
1, 1904, GMT". Given that the representation is stored as an unsigned 32-bit
integer, it seems like you cannot represent dates older than this "Epoch"
starting value. Since it's a file system, that's probably of much less concern
than a more general date-time format.

However, does anyone know the genesis behind which this particular date was
chosen? It's far enough back to not cause any issues and appears to be close
to the minimum date-time value obtained using the minimum value for signed
32-bit integer and the standard Unix Epoch (which would be around 13 Dec
1901), but other than that?

~~~
pjc50
[https://support.microsoft.com/kb/180162?wa=wsignin1.0](https://support.microsoft.com/kb/180162?wa=wsignin1.0)

The 1900 epoch needs additional code to handle the fact that 1900 was not a
leap year. Why not the UNIX epoch? Well that's another question entirely.

~~~
Someone
Also:
[http://support.microsoft.com/kb/214326](http://support.microsoft.com/kb/214326):

 _" Explains why the year 1900 is treated as a leap year in Excel 2000"_

Mac OS was written in a time of memory scarcity; 1904 is just an easier choice
for the epoch than 1900. That's also why they didn't choose the Unix epoch, as
it would have required supporting multiple epochs, one for disk time stamps,
and one to handle birth dates (time stamps on classic Mac OS are unsigned;
there were no times before the epoch. And yes, there were no Mac users over 80
:-))

------
jansanchez
Yesterday's discussion:
[https://news.ycombinator.com/item?id=8876319](https://news.ycombinator.com/item?id=8876319)

------
kaolinite
There are very real problems with HFS+ related to reliability and data
corruption. It really is quite bad. Linus's argument is silly, however, and
purely a matter of taste. Case insensitivity certainly isn't the reason that
HFS+ is bad.

~~~
jiggy2011
I don't even understand why he doesn't like case insensitivity. I can't think
of a reason to have two files with the same name that vary only in
capitalisation.

~~~
rogerbinns
Are these two names the same case insensitively? Pick and PICK. Hint:
[https://en.wikipedia.org/wiki/Dotted_and_dotless_I](https://en.wikipedia.org/wiki/Dotted_and_dotless_I)

How about Straße, STRASSE and strasse?

For an American audience, how about Résumé and RESUME and resume?

How about Ⅸ and IX? Hint:
[https://en.wikipedia.org/wiki/Unicode_equivalence#Normalizat...](https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization)

The answer to all that is that it depends on the user and their locale
settings. So to get this right you would have to pass the locale settings into
system calls that go into the filesystem so it can do the right thing on every
call. But then what happens if the locale later gets changed? Before some file
names may be considered equivalent, but with the new one they are different
and things break. Ok how about a system wide locale setting that can't be
changed? Are you going to require a disk reformat each time a different user
is on the system, or a multilingual user who travels?

The right solution is exactly what Unix did in the first place. The filesystem
stores names exactly as is without having to deal with locales. User space is
then responsible for doing the right thing for a user. For example a word
processor can treat Résumé and Resume as the same for Americans in file
selection dialogs.

User space already has to deal with all this, for example by sorting file
names. Here is how hard that is
[http://www.unicode.org/reports/tr10/](http://www.unicode.org/reports/tr10/)
and a simple example is if z comes before or after ö. Or when showing a list
of Swedish names to a German user do you use Swedish or German sort order?

A similar analogy is dealing with time. The best approach is to store the time
as UTC in the filesystem and let user space show it in whatever way is
relevant for the user. On the other hand if you store local time in the
filesystem, you will end up in a world of hurt trying to work around that.
(UTC to local time is a lossy conversion making local time back to UTC hard to
work out.)

~~~
archagon
But half the time, I'm having to deal with file access and creation in the
terminal. That's as an important user experience to me (a power user) as
anything in the GUI. Does Unix change the way the terminal handles files
depending on the locale? My understanding is that it does not. (If it did,
maybe that would be the best solution for everyone, but I imagine a lot of
geeks would be up in arms about how the terminal experience is not
consistent...)

~~~
rogerbinns
There are multiple layers at play. For example someone looking a file
selection dialog in a gui has at least these layers, lowest level first:

\- Filesystem implementation (actual on disk bytes)

\- Filesystem compatibility (adjustments to OS semantics)

\- Generic filesystem layer in the kernel

\- System call interface (between userland and kernel)

\- Userland low level library (eg system call wrappers)

\- Higher level libraries (eg libc)

\- User interface libraries

\- App level modules & libraries

The problem with the filesystem being case insensitive is that is the first
level above, where there is virtually no context or user information. That is
the hardest place to work out if the user considers Résumé and RESUME to be
the same. The lower down the list the easier it gets. The important point is
that how you store things and how you present does not need to be the same.

Command line tools typically use the libc level of things. It is perfectly
possible to make that level so the system appears case insensitive (another
poster mentioned Windows doing that).

Your Unix command line experience is very much affected by your locale. For
example the ls command does the following:

\- Sorts filenames

\- Displays sizes (different locales have different conventions for floating
point representation, digit grouping etc)

\- Displays times

Environment variables like $LANG, $LC_*, $TZ etc control this in Unix.

Individual programs can also alter behaviour. For example bash can be
configured to do case insensitive filename completion.

TLDR: while case insensitivity is a nice user experience for humans, it is
awful for the lower layers including filesystems. The closer a program is to a
human, the easier it is to give that nice user experience. There is no
requirement for a case insensitive user experience to then make filesystems
also implement that. And as my examples show, they couldn't do it correctly
anyway.

~~~
woodman
Don't forget the additional fun that terminal emulators add in the journey
from libc to your eyeballs. Some of the most difficult bugs I've worked on
involved grep with the color flag and automatic build systems...

------
sigsergv
He's absolutely right about NFD, it's horrible. If you're using plain latin-1
you usually don't have any problems. And everything changes if you are using
“composite” characters like à or ё or й. It's not just HFS+ issue, it's a
system-wide issue, that break mounted network resources for example. Or even
USB sticks.

------
archagon
Technical hurdles aside, I just can't agree that case-sensitivity makes sense.
To most people, an "a" and an "A" are not different symbols; they're the same
letter with "uppercase" formatting applied, much like bold or italics. And
although one could argue that you could make a case-insensitive GUI layer
while keeping the implementation case-sensitive, I feel that the terminal can
be just as important from a UX perspective as anything else in the OS. After
all, it's one of the main places power users create and access files.

Given how much Siracusa and Torvalds (to an extent) advocate for better user
experience over ease of technical implementation, I feel I'm missing
something. Our job as programmers is to encompass the complexity of the real
world in our programs, not to make our users think like machines.

~~~
forgottenpass
_not to make our users think like machines._

If differentiating between "file" and "File" is thinking like a machine, how
isn't differentiating between "my files" and "myfiles" thinking like a
machine?

Should "my files" and "myfiles" be collapsed to mean the same thing? What
about "files"?

~~~
archagon
An "f" is not a different character from an "F". To most people, it's the same
letter with special formatting applied. They don't see it and think, "Oh,
that's ASCII character 102." An F is an f, no matter how you type it.

On the other hand, "myfiles" and "my files" have different numbers of
characters. They sound different in your head, and you can easily point to the
difference. It's the same as typing "alien" versus "a lien".

~~~
userbinator
A Ford is a very different thing from a ford. You can drive the former through
the latter.

If "most people" cannot tell the difference, then I believe that would be a
failing of the education system.

------
AshFurrow
I wonder how much Adobe's incompatibility with case-sensitive filesystems
affected Apple's decisions: [https://helpx.adobe.com/creative-suite/kb/error-
case-sensiti...](https://helpx.adobe.com/creative-suite/kb/error-case-
sensitive-drives-supported.html)

------
mproud
Focusing on the non-technical details… case-sensitive filesystems just seems
horrible for consumers!

At my day job, I work with people who have barely used a computer before. I
could see an 80-year old woman having three versions of a letter she typed in
Pages or Word, simply because she had mixed use capital letters, and then
mistakening the versions and deleting the wrong file.

------
haberman
I seem to remember a story of how this decision was made (or at least a part
of it). I thought I remembered hearing that Steve Jobs was in a roomful of
developers and took a quick poll of whether they liked case sensitive vs case-
preserving filesystems, and case-preserving won.

I could be totally making this up though. Anyone remember hearing this and can
confirm?

------
mproud
Another misleading headline, simply for linkbait and dollars.

------
duaneb
This is just Linux grumping about subjective tastes—he rags on the default
case sensitivity of HFS+ and the finder, neither of which he's forced to use
even if he's stapled to a mac. If you want a case sensitive filesystem,
reformat your damn mac! I've seen it confuse enough non-computer savvy people
I wouldn't want Apple to change their filesystem for the sake of developers
who need to distinguish between headers by case.

NB: I use a case-sensitive HFS+ install for cross-compiling when it's
necessary.

~~~
outworlder
Don't underestimate the power of defaults.

Also, this can trip up in several other contexts besides headers. For
instance, someone renames a file in their repository, changing just a filename
case. And then you'll have to scratch your head a bit to understand a git
issue that will invariably arise when you pull in the changes. Unless you are
fortunate enough not to have a deadline, you won't be so quick to reformat the
machine.

This actually happened to me. In the end, it was quicker to spin up a linux vm
than to reformat the machine.

EDIT: Also, you appear to have missed the part that says "There is a case
sensitive option, but Apple actively hides it and doesn't support it." No
support is really, really, really bad. Noone will deploy OSX machines
formatted like that if it is unsupported.

~~~
tambourine_man
OS X is primarily a consumer's OS. Every time a decision has to be made
between making the life of the programer or the consumer easier, Apple will
choose the latter.

~~~
walterbell
On iOS, Apple made a consumer-unfriendly change when they disabled virtual
printing, e.g. local Print-to-PDF from any application to a virtual-printer
app.

------
api
HFS+ and NTFS both suck pretty hard. There seems to be some kind of pessimism
filesystem principle in popular mainstream OSes.

It's kind of impressive that Apple and MS have managed to make such balls of
crap work as well as they have.

------
tambourine_man
What a shitty article. HFS+ has many problems, but case insensitivity is a
feature. It's good for end users. There's a case sensitive version of it, but
it's not the default, and for a good reason.

If you want a good critique of HFS+, check one of John Siracusa's many
rants[1].

Also, Gruber is not the author of Byword:

 _Even John Gruber of Daring Fireball and author of Byword…_

[1] [http://arstechnica.com/apple/2011/07/mac-
os-x-10-7/12/](http://arstechnica.com/apple/2011/07/mac-os-x-10-7/12/)

~~~
gondo
what a shitty comment. did you try to use some Adobe product on case sensitive
version of HFS+? let me guess, no you didn't. because if you would, you would
find out that some programs simply doesnt work on case sensitive HFS+, most
likely because people like you, who simply "know" whats better for end user

~~~
durin42
It's almost certainly a bug in a build script somewhere, and they're
dynamically linking against something via the wrong name. I once linked
against Quicktime.framework, and it worked on everything but my case-sensitive
Mac. QuickTime.framework worked everywhere.

So your Adobe example is flawed. They should be testing on case-sensitive
volumes, but the vast majority of normal users don't want a file named README
and readme in the same directory, and enforcing that invariant seems
reasonable to me.

------
ZoFreX
Really? The developer who wrote software for Windows and OS X that depends on
a case-sensitive file-system is going to throw stones about bad software?

Windows and OS X use case-sensitive file-systems. That sucks. But if you are
going to release software for those systems, you need to accept that that is
the reality, and write your software accordingly. This is hardly the first
issue Git has had due to this, Linus's time would be better spent fixing these
issues rather than stamping his feet and expecting OS X and Windows to change.

~~~
fulafel
Linus didn't write or port git for Windows or OS X. In fact for the longest
time git on Windows required Cygwin. I doubt Linus was even much involved in
git maintenance by the time native Windows support was merged... or was it
ever? Is msysgit a fork?

