A brief history of APFS (2022)

goranmoomin · on March 28, 2023

As a long-time Korean user of macOS, I hate the fact that APFS is normalization-insensitive yet normalization-preserving.

As a result, this means that… if you create a Korean-named file yourself in Finder: then the file name is normalized in Unicode NFD (decomposed) form, but if you create a Korean-named file in a zsh shell, the file name is normalized in Unicode NFC (composed) form.

You don’t realize the difference until you try an `ls | xxd`, but the bytes are different, even though they look the same.

It’s an invisible mess that people don’t realize until they do. And a lot of programs gets subtly crazy when fopen(A/B) succeeds but listdir(A) doesn’t list B.

stonekyx · on March 28, 2023

I think I observed this problem recently too, but with Japanese file names.

Basically I had a Git repository and didn’t know about `git config core.precomposeUnicode`. And when the repository is synchronized to a Linux system, on the Linux side it can sometimes have 2 files with the same-looking file name but different normalizations. (Because I think ext4 doesn’t normalize Unicode?) That took me about an afternoon to fix.

monocasa · on March 28, 2023

> Because I think ext4 doesn’t normalize Unicode?

Yeah, native Linux filesystems don't really know anything about unicode at all at their heart. They really work more on raw bytes, simply reserving 0x00 (NUL) and 0x2F ('/'). Anything else goes in a filename as far as the kernel is concerned (including evil stuff like incomplete multibyte sequences, or other invalid UTF-8 byte sequences). User space is welcome to and encouraged to treat filenames as UTF-8 on modern systems, but that's not really enforced anywhere strongly.

totetsu · on March 28, 2023

What can I do with Japanese characters in Zip files that come out all messed up and Cyrillic when extracted under linux?

actionfromafar · on March 28, 2023

It depends, Japanese filenames are sometimes in an 8-bit codepage. There are several.

https://en.wikipedia.org/wiki/Code_page

https://stackoverflow.com/a/45583116

https://github.com/m13253/unzip-iconv

totetsu · on March 28, 2023

Thanks. unzip -O Shift_JIS <file> did the trick.

shp0ngle · on March 28, 2023

I had an issue where I had unicode files on FAT and I wanted to copy them on APFS on mac; and half of them I could not copy, because they were in… uhh either NFD or NFC. One of them. macos just showed some random error when I tried to copy.

What I needed to eventually do is to go on Windows, install WSL there, then run some script that recursively renamed them from the first to the other (on the FAT volume), and only then I could copy them from FAT to APFS, on the Mac.

(I could probably skip the WSL and do it with some Windows-only tooling, but that would take me even longer.)

so this little exercise involved mac, windows, linux (and plan 9, if you count the wsl filesystem shenanigans)

howinteresting · on March 28, 2023

Hahahaha, you're at the surface looking at a well of problems that goes all the way to the center of the earth.

HPsquared · on March 28, 2023

Is this a bit like how Windows preserves capitalization but doesn't otherwise care about it? (except in WSL of course) - or is it something more complex

messe · on March 28, 2023

Kind of. Imagine if capital letters looked identical to regular letters just with a different byte representation, but Windows preserved them regardless.

Well, it turns out that Unicode can represent a lot of characters by way of different possible encodings (for example é could be a single codepoint, or the codepoint of e followed by the codepoint of an acute accent).

SOLAR_FIELDS · on March 28, 2023

Notes does something like this for some reason. The quotes that are used in the Notes app that are then copy pasted into other apps like Terminal or a code editor are not the same as the quotes we use every day. I think some apps are smart enough to fix this by default (perhaps IntelliJ?) but otherwise it’s the source of a non-negligible amount of annoyance if you like to use Notes for whatever reason

Tagbert · on March 28, 2023

that’s because curly quotes are different characters from the straight quote characters that you can see on your keyboard. Some apps will convert straight quotes to curly quotes to “help” you.

https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

sprayk · on March 28, 2023

in system settings somewhere you should be able to disable smart quotes (or something like that).

IshKebab · on March 28, 2023

Similar except that you can see the difference between lowercase and uppercase.

sumtechguy · on March 28, 2023

Windows has a flag for that for it to respect case. But it would probably bug out a bunch of programs. I usually see it with samba shares. But you can get it to do it on the local filesystem too if you try hard enough. Also it is a bit of a pain when it does happen if you get two files with the 'same name'. One file will be basically invisible to most of the normal windows tools.

Aaron2222 · on March 28, 2023

APFS does the exact same thing with capitalisation as well (presuming you're using case-insensitive APFS, which is the default).

cynix · on March 28, 2023

I thought the default has been case-sensitive for a while.

riceart · on March 28, 2023

You thought wrong.

sacnoradhq · on March 28, 2023

I think the solution is to stop trying to over-manage user- and filesystem-input.

Be faithful and preserve what was entered rather than trying to "second-guess" the user.

kalleboo · on March 28, 2023

What is the solution here though? They forced normalization in HFS+ and everyone hated that even more since it changed your file names when you copied over files. And they can't be normalization-sensitive because then you can't type in a file name.

flohofwoe · on March 28, 2023

The solution is simply what Linux does: treat filenames as bags of bytes instead of strings (but this would also include case-sensitivity). At least then the behaviour is completely straightforward and transparent down in the file system layer.

UNICODE details like normalization can be handled up in the application layer.

Someone · on March 28, 2023

“Bags of bytes” has its own problems. If the idea of ‘character encoding’ is alien to the file system, the application layer cannot reliably show file names to users, other than as sequence of byte values. It cannot even assume 0x41 is an ‘A’, for example.

Historically, you configured your OS to match (what you thought was) the encoding used on your disk, but that starts to break when you have network disks or when users exchange floppies.

In practice, I think about every Unix/Linux file system at least assumes ASCII encoding for the printable ASCII range. How else does, for example, your shell find “ls” on your disk when you type those characters?

So, not even in Unix/Linux is a file name “a bag of bytes”

I think file systems should enforce an encoding, and provide a way for users of the file system to find out what it is.

Apple’s HFS, for example, stored explicit ‘code page’ ID so that the Finder knew, for example, to use the Apple Cyrillic encoding to interpret the bytes describing file names on disk for that disk.

The modern way is to just say file names are valid Unicode strings, with UTF-8 being the popular choice. Unfortunately that introduces the normalization problem.

sacnoradhq · on March 29, 2023

Are you running EBCDIC or a localization other than UTF-8 anywhere?

The filesystem shouldn't arbitrate localization issues. It's the wrong place to do it in terms of performance, area-of-responsibility, and portability.

NUL-terminated UTF-8 in memory and size+data on disk is the best encoding.

Normalization is a multifaceted problem that needs to be handled at or before where the filename argument hits the standard library. Sometimes, normalization is NOT desirable and it is preferrable to normalize for comparison only to other strings and to faithfully store whatever was provided verbatim.

Side-effects and doing 737 MAX MCAS underneath users is the road to ruin.

gpderetta · on March 28, 2023

> the application layer cannot reliably show file names to users, other than as sequence of byte values.

As you say, it can just show the sequence of bytes (plus some visual hints that they are escaped). It is not like nonsensical filenames with seem composed of random numbers are uncommon (e.g. look into your .git dir)

> How else does, for example, your shell find “ls” on your disk when you type those characters?

This is purely a presentation layer issue. The filesystem assumes nothing.

klausa · on March 28, 2023

The git files aren't supposed to be seen by a human being.

What you're suggesting would mean that a Korean user saving a file on their desktop and naming it in Korean would in turn see the filename turn into garbage, which, I hope we can all agree, is not a reasonable suggestion.

gpderetta · on March 28, 2023

Of course if the file-name is correctly decodeable in the user preferred encoding (hopefully UTF-8) it should be shown as such. But giving up on supposedly malformed filenames shouldn't be an option for robust software.

CorrectHorseBat · on March 28, 2023

Why though? What do you lose by forcing utf-8 at the filesystem level?

gpderetta · on March 28, 2023

Compatibility with a few decades of data and with other systems.

eviks · on March 28, 2023

But the future is longer than a few decades, so why extend the mistake further?

gpderetta · on March 29, 2023

1) Not everybody, or even a majority, agrees that the current state is a mistake

2) Change what to what? There are multiple filesystems systems (and OSs) with different and sometimes incompatible filename encodings. Who picks the winner?

3) There is irony on dropping backward compat and requiring UTF-8, an econding whose claim to fame is backward compatibility with ASCII based systems (and not coincidentally was designed by UNIX Elder Ones)

eviks · on March 29, 2023

I was responding to your specific argument ignoring the bigger future for the smaller past, not sure how the broader questions about what others think etc is relevant

1) so? That will be true for anything

2) sequence of bytes to human encoding. Who picked the loser??? This would similarly be some unidentifiable group of people. You can start by picking the winner for yourself, even at a conceptual level

3) no irony, sequence of bytes is not ASCII

sacnoradhq · on March 29, 2023

Because change costs way more to than you think: maybe billions of dollars and impacting billions of users.

Creating 2 different systems carries risks: it has a learning curve, a support cost, and breaks everything that came before.

Perhaps, in a utopian green field, we redo a systems language, a portable systems API, and a common operating system with a compatibility layer to the old way on top.

eviks · on March 29, 2023

Change has already partially happened since there is no one single filesystem with one behavior.

Not changing impacts many more users (all the future billions) and costs even more: the learning/support/etc. is all there, and it's harder to learn about a buggy system since you have more papercuts

Also, it doesn't break everything, we're not living in a dystopia

actionfromafar · on March 29, 2023

This is all true. We should also fix global warming, it's the same kind of problem. Massive investments now or even more massive losses tomorrow.

eviks · on March 28, 2023

It's straightforward, transparent, and wrong, because filenames are not a bag of bytes, they are labels by and for humans, so no good API should ignore this simple fact

Wicher · on March 28, 2023

Nothing stops you from encoding your human-labels as bytestrings. Most fellow humans do exactly that. "Gosh I wish the relation between this file's human-label and its filesystem-identity were something other than bijective", said no Linux user ever.

A filename is an identity and if identities are ambiguous, you run into trouble, examples in this thread abound.

Putting it another way, do you work with relational databases? Primary keys? Those are identities. I don't even want to think about a world in which I can't roundtrip data through a database and have things NOT come out the way I put them in. "Human labels" or not ;-)

Of course it's fine to search for things case-insensitively or otherwise normalized. And so we have "find -iname" and "locate -i" etc. etc. That's totally reasonable.

eviks · on March 29, 2023

> Nothing stops you from encoding your human-labels as bytestrings

The fact that inhuman labels can sneak in stops me

> said no Linux user ever

You just weren't listening

> you run into trouble, examples in this thread abound

Bag of bytes approach is not trouble free

> Primary keys?

Do those accept any bytes in a bag?

> can't roundtrip data through a database and have things NOT come out the way I put them in

So you just want normalisation-preserving filesystem?

But I see that you care so little about human users that you think they should learn to avoid Á outside of letters to grandma...

Wicher · on March 29, 2023

> The fact that inhuman labels can sneak in stops me

Stops you from what? I'm not sure what you mean there. No one mentioned inhuman. Someone, not me, instantiated a "labels for humans" concept and I roll with that, for the sake of conversation.

>> Primary keys? > Do those accept any bytes in a bag?

Well, a "bag" (unordered) would not be the right term, but we understand what the original author meant (a bytestring), so: Yes, in fact they do. For instance, PostgreSQL:

create table bla (joyfulkey bytea PRIMARY KEY)

We don't want identifiers mangled. Should you be so unfortunate to have to use bytestrings as primary keys, then they'll come out the way you put them in. Because they're unambiguous identifiers. Not prose.

> So you just want normalisation-preserving filesystem?

I want an identity-preserving filesystem. And I have it!

> But I see that you care so little about human users that you think they should learn to avoid Á outside of letters to grandma...

I care a lot about human users. I give them nice file pickers so that they can easily choose to open files named ÁÁÁ.docx. Or search for those, using a system where both query and index is normalized (as is the common approach). And the other kind of users, also human, who find they care about the identity of files since they're writing that search engine and have to pass them in an argument vector to get the PDF viewer to open the file that the user found, will be most happy if they don't need to worry about the file system second-guessing them on identities.

PS There's a lot of you-this you-that ad-hominem things in your comments. That's not really the best way to hold a discussion.

kalleboo · on March 28, 2023

Personally I loved how Classic MacOS stored all it's file references as an "alias" data type which used the equivalent of inode, going as far as encoding file server connection details for auto-reconnect and a fallback path of inodes and names to traverse in case the file was removed and replaced.

But that worked in an OS that had no command line - UNIX users are allergic to binary configuration formats and demand to be able to type path names.

ClumsyPilot · on March 28, 2023

Do you speak other languages?

Czech language has Á, Č, Ď, Ě, É, Í, Ň, Ó, Ř, Š, Ť, Ů, Ú, Ý

Á can be represented in Unicode by its spesific codepoint, or can be made up of two - Letter A and the squiggly bit.

So when a user tries to fOpen a file called ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ

Should by application make 2^32 requests to the file systen to open a file for every possible combination of bytes this steing could be represented as?

Wicher · on March 28, 2023

At this point the user will discover that easy to type unambiguously encoded filenames have value, and may opt to save the Á's for when they actually are writing a letter to grandma.

"But that's subjecting the user to limitations of the system!" you might object, and while you'd be right we should also ponder whether a filesystem is a word processor or a system for locating data using filenames as identifiers, and consequently, what sound handling of identifiers is.

And let's be realistic. Most users will choose the file to open from a file picker dialog. fopen() is not their interface, their interface is a file picker.

The intersection of people for whom fopen() is their interface (no file pickers) but for whom learning lessons on the value of using unambiguous identifiers is somehow out of reach, would be exceedingly small.

hibbelig · on March 29, 2023

> At this point the user will discover that easy to type unambiguously encoded filenames have value, and may opt to save the Á's for when they actually are writing a letter to grandma.

The unambiguously encoded filenames wouldn't be in the natural language of the user, so you're asking the user to learn another language. Users shouldn't have to learn another language to use the computer.

Wicher · on March 29, 2023

I'm not 100% sure on this, but in many/most/all scripts, wouldn't there be a subset of characters in unicode for which you don't need to worry about unicode ambiguities with respect to combining characters?

For prose of course it wouldn't be acceptable if you can't use your full language, but this is a filesystem, and we're talking identifiers, not prose. We accept that for a wide range of everyday things — I don't think many people are getting worked up about web forms not accepting their phone numbers when entered in Latin numerals.

Using bytestring labels sans normalization is just the rational choice, and actually quite accommodating. People _can_ use their natural language. If you can encode your prose in bytes, then there you go, that'll be your filename ;-) and if you can't remember what diacritic-combining strategy you used when you created the file, and can't manage to fopen() it anymore by typing out the filename, use a file picker or search. And then, perhaps, go "that was a hassle, I'll avoid those fu͝n͜n͏y͝ sq̧u͜igg̶l̢įȩş for my next filename, computer are stupid" - not the end of the world, not the worst outcome.

hibbelig · on March 29, 2023

I'm not 100% certain, either. Hasn't someone else here written that all of the Korean characters can be expressed in two ways?

Then there is Vietnamese which has a lot of accents and other diacritics. Not sure how much you can do without them.

Perhaps a language like Chinese doesn't have so many ambiguities, as in there is only one way to encode a given character in Unicode?

flohofwoe · on March 28, 2023

I speak German which has its share of pointy letters :)

I'm aware of the normalization problem, but that should be handled in the API layers above the filesystem. In which layer exactly is up for discussion though (but IMHO there should always be a fallback to access files via their "raw bytes path identifiers", if for nothing else than debugging).

sacnoradhq · on March 29, 2023

Yes. Separate concerns and group logic where the knowledge and configuration to handle it for the use exist closer to the application, not scatter it throughout the system stack that shouldn't need to know anything above it.

The main ways of finding files are by enumeration and filtering on that enumeration. Normalization should selectively occur during filtering rather than permanently destroying the original bytes of the name of the file. Without a binary-precise filename call to open() (fopen() is a library call, not a syscall), then case-folding and Unicode normalization shouldn't second-guess the request or help the program. Case-insensitive filesystems are industry nonstandard and a dumb choice Apple continues to use.

FeepingCreature · on March 28, 2023

To be fair, what a sufficiently clever application could do would be to try and naively open the file, note a mismatch, then do a directory walk comparing normalized filenames.

Still not great.

flohofwoe · on March 28, 2023

Ideally, un-normalized paths wouldn't even make it down to the filesystem layer. I'm just saying that it shouldn't be the filesystem's job to care about text encoding details, but the layers above it (e.g. the filesystem would work in a simple "garbage in - garbage out" mode).

sacnoradhq · on March 29, 2023

This is the answer. Normalization cannot and should not be guessed at by the filesystem. It's the wrong place to do it. If normalization is generally (but not universally) desirable, then the standard library (libc, etc.) should provide options to make it more convenient with sane defaults. Second-guessing users and developers with assumptions outside of its domain is a recipe for failure and endless workarounds.

j16sdiz · on March 28, 2023

... and this cause lots of problem in python2 to python3 migration.

Command line argument is treated as (unicode) string, and file name are (sometimes) treated as bag of byte .....

vlovich123 · on March 28, 2023

I still don’t see the problem with normalization.

CorrectHorseBat · on March 28, 2023

I guess the issue is that if you copy files between 2 systems, one which does normalization and one which doesn't. Best would be if everyone did normalization, but that ship has sailed :(

kalleboo · on March 28, 2023

And even if both systems have normalization, the normalization changes with Unicode versions, so just different OS versions will cause filename conflicts.

vlovich123 · on March 30, 2023

I’m not aware of that. How do JS engines implement String.prototype.normalize?

HPsquared · on March 28, 2023

Maybe each file should have two names instead of one?

actionfromafar · on March 28, 2023

MacOS at least has file ID, so there's that.

hibbelig · on March 29, 2023

Say that each letter has two versions, and they look exactly alike. This means there are now 8 file names that look like "foo", and it's not clear which one of them you have to type in to get the file you're looking at.

vlovich123 · on March 30, 2023

That’s exactly the problem normalization solves - all variants map the same.

hibbelig · on March 30, 2023

Im sorry, I misread what you were saying. I understood the opposite of what you were saying.

Kwpolska · on March 28, 2023

This isn't just an issue with the filesystem, apps also randomly produce NFD. And then you copy text from a PDF in Preview to Word, and spellchecking goes haywire.

shp0ngle · on March 28, 2023

But that’s exactly what they do now?

p_l · on March 28, 2023

Such issues is why ZFS has optional normalisation support, and serving files to Macs (or generally cross platform) was always one of the major examples of its use

wkat4242 · on March 28, 2023

Does that mean that you could actually have both versions as separate files? That would be really messy. Not to mention possibly exploitable.

howinteresting · on March 28, 2023

Well, the ideal solution is for everyone to use NFC all the time. But failing that, having both versions is preferable to what Apple does.

What is the threat model that makes having both versions exploitable? Historically, many vulnerabilities (e.g. several git vulns) have come from filesystems doing normalization rather than the lack of it.

wkat4242 · on March 28, 2023

The possible exploit is that you could have two versions of a file, one that the user sees and one that is executed.

Buy generally anything that leads to unexpected behaviour can be exploited in some way, if not technical then to mislead the user.

The question is, why is 'failing that' happening here? And if there's 2 files encoded differently with the same name, how would the user differentiate them?

TheUndead96 · on March 28, 2023

I had no idea this was a thing, this must be infuriating.

sneak · on March 28, 2023

Designed by Apple in 8859-1

Veliladon · on March 28, 2023

One of the things that is amazing about APFS is that some iOS updates prior to 10.3 would do a simulated update of the entire filesystem system as part of the update process across the entire iOS userbase. The devs would then take diagnostic data of failed mock upgrades in order to refine the process before the mostly non-event that 10.3 actually was.

MBCook · on March 28, 2023

I had heard of that before, that they did a mock upgrade.

But a tweet yesterday said they didn’t do a mock upgrade. Every single iOS update for a year did that so they could catch and fix a ton of bugs before the big switch.

Pretty genius.

coconut08 · on March 28, 2023

I never really used anything in the apple ecosystem until the m1 came out. It was such an interesting device that I wound up picking it up.

I was reading about macos' history with filesystems a while back and apparently they were considering switching to zfs at some point but instead decided to design apfs as their own in house next gen filesystem. In my experience with it it seems to lack almost all of the features that would have made zfs interesting. One of the most obvious usecases would be for something like time machine. I have a parallels virtual machine that is hundreds of gigabytes in size and as far as I can tell that each time you modify even a single byte in that vm it will need to backup the entire file all over again. not only is this infeasible in the amount of space it would require it's also infeasible in that every time I do a backup i would need to spend the time transfering the whole thing all over again. this is also one of the biggest problems that next gen filesystems were designed to solve. is it really the case that apple's next generation filesystem doesn't support snapshots or block level replication when one of their most obvious usecases for it would be time machine backups?

ribit · on March 28, 2023

APFS does have snapshots and at least the local TM backups heavily use them (it takes a fraction of a second to perform a local TM backup). TM backups to disk also seem to be organised as snapshots and have become much much faster in the recent macOS versions, so I don't think it does only copy changed blocks (but I can't be sure).

What I would love for APFS to have thought is block-level deduplication. Seems like an obvious fit for an SSD-optimized COW system anyway, surprised they haven't implemented it.

kalleboo · on March 28, 2023

It made a lot of sense for Apple to adopt ZFS back when they were shipping spinning rust hard disks and PowerMacs had room for multiple drives, but now that they're shipping soldered-in integrated flash drives with their own storage controller in the SoC, it probably makes more sense to them to put features like encryption and error correction in their storage controller rather than the file system, and then have a file system which is far more lightweight (so they can ship the same FS on a wristwatch as on a 20-core Mac).

APFS does have snapshots but I don't think it has replication.

Time Machine is so bad and has seen so little work, I feel like Apple really wants to drop it and move everyone to iCloud subscriptions instead. And yet they still don't have a cloud backup product, which is even stranger.

AnonC · on March 28, 2023

> One of the most obvious usecases would be for something like time machine. I have a parallels virtual machine that is hundreds of gigabytes in size and as far as I can tell that each time you modify even a single byte in that vm it will need to backup the entire file all over again. not only is this infeasible in the amount of space it would require it's also infeasible in that every time I do a backup i would need to spend the time transfering the whole thing all over again. this is also one of the biggest problems that next gen filesystems were designed to solve.

This is a likely an issue with Time Machine’s design rather than any limitation in APFS. Time Machine works with full files and uses hard links to link different versions. Unlike HFS+, APFS does support CoW (Copy on Write) and snapshots. You can verify this yourself by duplicating a file on your local drive and seeing that it doesn’t actually create one full copy of he file and occupy double the space.

ribit · on March 28, 2023

> Time Machine works with full files and uses hard links to link different versions

They did change it recently. It uses snapshots now (I think from Big Sur on?)

ShadowBanThis01 · on March 28, 2023

Why do Time Machine volumes inevitably become corrupt? This is a known problem but I don't remember the explanation.

Tagbert · on March 28, 2023

Nor is it inevitable. I have never run across a corrupt volume in TM.

ShadowBanThis01 · on March 29, 2023

I have, and I rely on Carbon Copy Cloner instead as a result.

The reports have continued for years, right into this one: https://forums.macrumors.com/threads/how-is-this-an-acceptab...

I recommend surveying the extant reports on the issue. I've read several analyses that assert that corruption is indeed inevitable, and my experience was consistent with that. But hey, roll the dice if you want. I'm just providing information.

cj · on March 28, 2023

For me, it was forgetting the password on the external hard drive. After that, iCloud got to a point where I felt like backing up became unnecessary so I stopped TM.

(I know this isn't the response you were looking for. I've also experienced corrupt TM unrelated to HD encryption, not sure why)

SllX · on March 28, 2023

It’s been long enough that I can’t remember if the project was already basically dead or not going anywhere by this point, but whatever the final nail was and whether it was nailed in before or after Sun was bought, Oracle buying Sun would have just completely cremated the project no matter what state it was in.

dottrap · on March 28, 2023

I got to speak to a couple different Apple file system engineers many years ago over the course of many WWDCs. I think it was around the Leopard or Snow Leopard time frame (2007-2009), where we casually discussed some of the challenges Apple was having with their ZFS effort.

(We didn't talk about iPhone, so either it had not been announced yet, or development was still so closed down, that it never came up as a topic of consideration.)

My recollection was with their ZFS work, while it was clear to everybody that Time Machine should be a big benefactor of ZFS, they were still unhappy with the high RAM requirements (and also not thrilled about the high CPU requirements). Since laptops were then the bulk of their sales, and these machines shipped with much more constrained specs, there was a lot of uncertainty about when/if they were going to make ZFS the default file system.

(Looking it up, the base 2006 Macbook had 512 MB of RAM, and shared RAM with the Intel GMA 950 GPU.)

I recall there always was a secondary backdrop of concerns with the ZFS license and also the fate of Sun, but the engineers I spoke with weren't involved in those parts of the decision making process.

throw0101b · on March 28, 2023

> I recall there always was a secondary backdrop of concerns with the ZFS license

From one of the co-creators of ZFS:

    > Apple can currently just take the ZFS CDDL code and incorporate it  
    > (like they did with DTrace), but it may be that they wanted a "private  
    > license" from Sun (with appropriate technical support and  
    > indemnification), and the two entities couldn't come to mutually  
    > agreeable terms.
    
    I cannot disclose details, but that is the essence of it.

* https://web.archive.org/web/20121221111757/http://mail.opens...

* https://arstechnica.com/gadgets/2009/10/apple-abandons-zfs-o...

InTheArena · on March 28, 2023

Thats what the engineers say.

What the lawyers say is different.

What the business folks who control the license say was very different. FUD and an unforeseen lawsuit hanging over your head at some random point in the future when Sun or Oracle needs a revenue lift at an end of a quarter is a recipe for ulcers.

flomo · on March 28, 2023

Not unforeseen nor FUD. There was a very real patent lawsuit between NetApp and Sun/Oracle related to ZFS.

https://www.theregister.com/2010/09/09/oracle_netapp_zfs_dis...

throw0101b · on March 28, 2023

> Thats what the engineers say.

Bonwick was the Sun Storage CTO and after the acquisition a vice president at Oracle.

tannhaeuser · on March 28, 2023

At some point AFPS was also said to be designed to take advantage of the special characteristics, or reduce wear and tear of, flash NVRAM devices, unlike ZFS which is a design from the spinning rust times, though I couldn't tell how specifically.

galad87 · on March 28, 2023

Time Machine has continued to use HFS+ until recently, I think TM on APFS was introduced in macOS 11, and it requires to start a new backup from scratch. However, it should support block level diffing.

babypuncher · on March 28, 2023

I don't know anything about APFS, but I can't blame Apple for deciding to stay the hell away from Oracle

EricE · on March 28, 2023

They ended up backing away from ZFS because of patents and Oracle. The irony.

_8j50 · on March 28, 2023

It might be due to licensing, even Linux went with btrfs (not sure if they mainlined zfs yet).

wfriesen · on March 28, 2023

Not mainlined now, and probably not ever given that Linus has said "considering Oracle's litigious nature, and the questions over licensing, there's no way I can feel safe in ever doing so"

https://arstechnica.com/gadgets/2020/01/linus-torvalds-zfs-s...

astrange · on March 28, 2023

apfs has snapshots, just not block-level snapshots.

joosters · on March 28, 2023

A brief history of APFS version numbers, more like. Doesn't seem to tell us much about APFS as a thing, other than saying 'there was a bug in this version'.

viraptor · on March 28, 2023

It seems to mostly work and get out of the way, but there's one thing I'd really like them to add.

There's a way to explicitly compress files if you want. You can get the "afsctool" and run it on the directory with your sources for example and gain some space back. Unfortunately, this gets applied per-file only. There's no way to mark a directory to always compress everything in it. I don't think you can do it per-volume either.

It would be great if those options got exposed to the user. I'm not going to hold any breath though, it may be one of those "Apple knows better" issues.

saagarjha · on March 28, 2023

Transparent file compression is nothing new; if anything it’s filesystem agnostic assuming what you’re running on supports resource forks. In fact in many ways it was really designed for HFS+; many of the more esoteric ways to compress files don’t make sense on APFS.

viraptor · on March 28, 2023

HFS wasn't great about the compression. You got 0-size files with some extra attributes, which is not always usable. APFS is actually an improvement here - mainly because apps do not need to know about resource forks.

saagarjha · on March 28, 2023

It's meant to be transparent to applications that aren't aware of how it works; the mechanism hasn't changed (you can store data in either the xattr itself or the resource fork).

abofh · on March 28, 2023

It would be hard to do directory wide - how would you handle a hard link for example?

viraptor · on March 28, 2023

Choose one behaviour and document it?

Hard links are such a rare exception in any system that either ignoring them or compressing regardless would be fine.

Or check what others have done. You can do "btrfs property set /path compress zstd", so someone's solved it before.

eviks · on March 28, 2023

Compress it since it's just a regular file, doesn't matter that it exists elsewhere

Or add an config option

Why is that hard?

bartvk · on March 28, 2023

When it came out on Mac, I had a Mac Mini with a tiny internal drive (120 gigs). I tried to make an APFS Fusion drive where I fused the internal plus a fast USB-attached thumbdrive together. That seemed to work but would then fail upon reboot, because macOS would renumber devices and then couldn't find the drive.

Under Linux, you can simply assign /dev/sdX to a particular drive, just create an udev rule. Under macOS, that's impossible to my knowledge.

sacnoradhq · on March 28, 2023

The important Q: Even controlling a walled garden, how does one roll-out an entirely new file system that's never seen widespread commercial or retail use?

andylynch · on March 28, 2023

Lots of testing, up to and including as someone else mentioned doing dry run migrations on a very large number of user devices. Auto updating the file system on upgrade once that was done almost certainly simplified matters too.

eviks · on March 28, 2023

I remember "free" dir size was one of the supposed benefits. Has that materialized, are there file managers/tools that can show the size of any dir immediately just like you can have in Everything on NTFS+Windows

the-golden-one · on March 28, 2023

Given how long it takes an iPhone to calculate local storage usage on iOS, I assume not.

karlmdavis · on March 28, 2023

I don’t really understand how any file system designed in the last 20 years could be lacking data checksums. Data integrity is job #1, and yet.

greggsy · on March 28, 2023

Isn’t that offloaded to the underlying hardware?

Presumably they have sufficient control at that level for their current lineup, but I’m not sure what it means for aftermarket disks and external disks.

bartvk · on March 28, 2023

Yes, it's a shame. I'm guessing Apple takes care of that with their own custom SSD firmware, but I'd love to take the additional precautions. For now, I add par2 error correction/detection to my pictures.

whartung · on March 28, 2023

Does anyone use APFS snapshots in anger? Have you ever used them at all, or do you use them routinely in any kind of workflow? What are you using them for, and how do you use them?

Just curious how folks apply this day to day outside of how Apple leverages it.

ValentineC · on March 28, 2023

Needs a (2022) in the title.

ShadowBanThis01 · on March 28, 2023

Remember when Apple announced that ZFS would be the new filesystem for Mac OS?

Oh well.

Someone · on March 28, 2023

> Remember when Apple announced that ZFS would be the new filesystem for Mac OS?

I don’t. They worked on ZFS, even released read only support, and some people think they came close to making such an announcement (e.g. http://dtrace.org/blogs/ahl/2016/06/15/apple_and_zfs/) but I don’t think they ever made any statement that ZFS would be the replacement for HFS+.

ShadowBanThis01 · on March 29, 2023

https://www.zdnet.com/article/apple-announces-zfs-on-snow-le... https://www.theregister.com/2007/06/07/apple_using_zfs_in_le... https://sdtimes.com/apfs/apple-quietly-announces-new-file-sy... https://osxdaily.com/2007/06/12/now-apple-wont-use-zfs-as-th...

sneak · on March 28, 2023

Everyone wanted that but it failed due to a failure to reach licensing terms.

https://news.ycombinator.com/item?id=17852019

Apple's not big on f/oss. They maintain their hardware moat (historically) by differentiating with extremely proprietary software.

throw0101b · on March 28, 2023

From one of the co-creators of ZFS:

    > Apple can currently just take the ZFS CDDL code and incorporate it  
    > (like they did with DTrace), but it may be that they wanted a "private  
    > license" from Sun (with appropriate technical support and  
    > indemnification), and the two entities couldn't come to mutually  
    > agreeable terms.
    
    I cannot disclose details, but that is the essence of it.

* https://web.archive.org/web/20121221111757/http://mail.opens...

* https://arstechnica.com/gadgets/2009/10/apple-abandons-zfs-o...

flawn · on March 28, 2023

Friend of mine had randomly his home partition corrupt after upgrade to APFS on macOS ¯ \ _ ( ツ ) _ / ¯

steponlego · on March 28, 2023

Should have gone with ZFS. Shit Steve should have bought Sun.

mwcremer · on March 28, 2023

They almost did: https://arstechnica.com/gadgets/2016/06/zfs-the-other-new-ap...

tyingq · on March 28, 2023

>Steve should have bought Sun

That's interesting to think about. Would have propped up the xServe idea better. A timeline where an Apple server was a somewhat credible competitor to Linux servers might have changed some things.

macintux · on March 28, 2023

Given the disaster that would have unfolded had Sun bought Apple, I think Apple not buying Sun later is a fair trade for the timeline we lucked into.

actionfromafar · on March 28, 2023

Disaster for Sun or for Apple? Sun vanished anyway.

macintux · on March 28, 2023

Disaster for Apple (and I’d argue the industry as a whole; Apple makes things much more interesting than the Wintel duopoly would have managed without them).

steponlego · on March 28, 2023

Apple could have had ZFS, which is better than AFS, and also had SPARC which is more promising than Wintel or even ARM for high performance applications.

denkmoon · on March 28, 2023

One can dream :')