Hacker News new | past | comments | ask | show | jobs | submit login
Apple File System Reference [pdf] (apple.com)
248 points by abkumar 6 months ago | hide | past | web | favorite | 64 comments

In past, I implemented HFS+ on an embedded rtos platform,. Technical Note TN1150 [1] proved to be extremely useful asset. From a cursory look, I feel TN1150 was much more detailed, and perhaps can be treated as a pre-req to this document. At least it should have been mentioned in this document.

[1] https://developer.apple.com/library/archive/technotes/tn/tn1...

You got me curious, can you explain why you needed to implement HFS+? Was it read only?

I was working on a video recorder product that could record videos directly to iPods, iPhones and other media players. iPods were formatted to HFS+, if you connected them to a Mac out-of-box, and were formatted to FAT32, if you connected them to a PC. The platform i was working on was an RTOS, and we had to develop both FAT32, and HFS+ from scratch. It was not read-only, it supported both read/write.

Nice, that sounds fun, thanks.

This seems like it's probably enough to re-implement APFS on non-Mac platforms; in fact, the about page (page 6) says as much.

Kudos to Apple for providing the information. It's a hell of a lot better than reverse engineering the thing (see how many years it took to get NTFS down...)

There is already one third-party-implementation in the works. They sure find this helpful in their efforts.


This is user space – hopefully a kernel level one will come out as a result of this.

Or does it matter, performance-wise? People that know more can chime in.

Yes, FUSE is slower than in-ring0 implementation, even by the sheer overhead of the syscalls.

Iirc, it’s relatively straight forward to migrate to a module though?

Agreed. I was honestly expecting a more hand-wavy explanation, but I was surprised that they dived into so much detail on, e.g., the structs used to represent various objects in APFS.

Likely enough to create third-party disk recovery utilities.

From the second paragraph of the PDF:

"This document is for developers of software that interacts with the file system directly, without using any frameworks or the operating system—for example, a disk recovery utility or an implementation of Apple File System on another platform."

It's a little light. It reminds me of some vendor GPU documentation that explains the names of constants and layouts, but not as much how the pieces fit together, and the gotchas of the interrelated data structures. And that's what tends to be the hard part anyway.

At last! Apple's old APFS docs always had this mysterious note about Fast Directory Sizing:

"You cannot enable Fast Directory Sizing on directories containing files or other directories directly; you must instead first create a new directory, enable fast directory sizing on it, and then move the contents of the existing directory to the new directory."

but there was never any documentation on how to do this, and no Apple engineer would say. The most common internet theory seemed to be that this feature was purely automatic, and all mentions (like this) in the docs were just incredibly misleading.

Now it seems we have an answer, in this flag: "INODE_MAINTAIN_DIR_STATS: The inode tracks the size of all of its children."

The 'Fusion' section at the end is interesting. macOS 10.13 didn't convert Fusion drives to APFS, but 10.14 will. Presumably this specific support for Fusion drives is new in 10.14.

HFS+ had no knowledge about Fusion drives, the caching was handled entirely at block-level by the lower CoreStorage layer (although later versions did add some flags so CoreStorage could pin metadata/swap blocks to the SSD).

Now what I'm really interested to see is if they open-source the filesystem driver along with the macOS 10.14 code drop. HFS+ (and its utilities) has always been open-source, last year APFS was not.

The only bad thing about Apple supporting APFS on Fusion Drives is that it gives them an incentive to continue selling future iMacs and Mac minis with Fusion Drives.

Having had to replace failed HDDs in Fusion Drive iMacs at work, it's certainly no fun. For all new Mac purchases I ensure they are SSD only now.

On that note, I am surprised that they added Fusion-awareness to APFS, rather than just putting APFS on top of CoreStorage.

It certainly is better to have the filesystem aware of the Fusion situation, but...measurably, significantly better? Would the experience have been significantly worse without it? 10.13 betas allowed APFS use on Fusion drives, presumably without any Fusion-awareness in the FS.

I'm surprised, but happy to see they did it.

There are several cases where the file system has monotonically increasing integers that, when overflowing, are unrecoverable errors.

Those counters always are 64 bits, and won’t overflow in normal use (for example, the text says: ”if you created 1,000,000 transactions per second, it would take more than 5,000 centuries to exhaust the available transaction identifiers.”), but I can see people making ‘interesting’ disk images, for example ones where writing to a specific directory is impossible or, depending on how the implementation handles it, even panics the OS.

One of my long-time favourite macOS applications -- iDefrag -- had support withdrawn shortly after APFS appeared. Reasons cited were lack of an APFS spec and increased System Integrity Protection.

I fear this is too little, too late to have iDefrag make a comeback. I understand defragmenting an SSD typically does more harm than good [edit: and I only defrag spinning drives), but nothing touched it for effectiveness on spinning drives.



I haven't defragged a disk in almost a decade. When is defragging an SSD ever a good thing?

I only have spinning rust in my (older) NAS or as backup drives.

I don't defragment SSDs at all. I should've made that clear in my comment.

I have a multi-HDD system running to 6TB of storage, I can't run to the £ of an all-SSD system at the moment, but that's on the cards for the future, so it's spinning rust for me, for now.

Probably often but there is no way to tell really because the controller has a mind of its own and they are all different.

> Probably often

Defragging an SSD is often a good thing? Why? It would seem to greatly increase wear for no benefit.

If your fragments are really small like 16kb then you could see a significant performance improvement, due to better predictive loading and packet overhead. This is clear from SSD benchmarks.

But I don’t see how this would ever realistically happen.

What about the massive effect on wear?

You don’t know if there is wear. The controller is a separate computer. It may be clever and just remap blocks if you copy or move them around. It could be that defragmentation allows the system to work in a way that is better for the controller or allows that controller to place the blocks in a better way.

HFS+ has had on-the-fly defrag for ages now. Did you have issues with it? What kind of files do you work with?

Here's the essential business logic specification for HFS+ automatic de-fragmentation, for the curious.

It's actually quite a clever spec, because it takes advantage of existing efforts to read fragmented files to perform the majority of the de-fragmentation process.

I'm not sure if this spec applies to APFS or for SSDs. (With SSDs you're generally better off not defragmenting most of the time, because the performance penalty is far lower, but the write amplification has consequences.)


When a file is opened on an HFS+ volume, the following conditions are tested:

  If the file is less than 20 MB in size
  If the file is not already busy
  If the file is not read-only
  If the file has more than eight extents
  If the system has been up for at least 3 mins
If all of the above conditions are satisfied, the file is relocated—it is defragmented on-the-fly.



A moderate amount of my work involves video training and the component parts to build, so raw video and audio. It's subjective, but there are times that video and audio stutters a bit and it's not down to the available CPU or RAM (plenty of headroom there).

> I fear this is too little, too late to have iDefrag make a comeback

Defragging has been snake oil for more than a decade, anything that hastens it’s demise is a good thing.

My computer (macOS Sierra, HFS (Journaled)) has 2x spinning drives of 3TB, and operations are noticeably quicker after an overnight defragmentation run, which I do twice a year or so. Anecdata, granted - but it works for my situation.

If it works for you, then great, but here's an article from almost 10 years ago where Apple discourages defragging[1]. There's a bunch of other articles with more specifics showing that advanced defrag features from the 90s were built into the OS years ago[2].

[1] https://support.apple.com/en-us/HT1375 (last updated 2010) [2] http://osxbook.com/software/hfsdebug/fragmentation.html (from 2004)

I just took a snapshot of one of my hard drives (2TB) from iDefrag:


The red slivers towards the lower middle are where the reported fragmented files are located, and the lower right is the fragmented files listed by number of fragments descending.

The anonymised .mkv files are training videos with multiple language and subtitle streams. They're exported to a scratch drive and then copied to the the drive they currently reside on.

After a full, offline defrag (including a b-tree rebuild) the legend for defragmentation is a neatly arranged list of that blue/grey colour, no red to be seen.

Given that macOS automatically defragments sub-20MB files on the fly (which covers 99% of the files which affect perceived system performance) I'd wager money that your experience is 100% placebo.

I know from personal experience how powerful this placebo effect can be.

>Given that macOS automatically defragments sub-20MB files on the fly (which covers 99% of the files which affect perceived system performance) I'd wager money that your experience is 100% placebo.

Perhaps. I can certainly tell the difference performance when an overnight defrag run has finished, and I have no beef with you to prove or disprove a point.

If you're ever in North Cornwall, UK: drop me a line and I'll show you a before and after over a mug of coffee/tea/etc.

You have not used a file system exceeding 90% use with large files and lots of little files coming and going.

ZFS in particular completely falls off a cliff somewhere between 80% and 90%, due to the copy on write nature of ZFS always allocating and freeing small bits of space. That creates the little gaps all over the FS which murder performance when the big gaps run out.

The old default for this was lower but most modern versions of the ZFS defaults don’t hit this until the vicinity of 96%

It’s not technically the little holes or even CoW that is the problem. It simply switched from a “first fit” to a “best fit” algorithm as it got full which was quite expensive to do the search for.

I’m still sad that this filesystem does not contain file data checksums. It looks like we will be stuck with it for some years to come.

It does for metadata, but yeah, even if off by default, there should have been at least an option to turn it on.

Presumably data integrity is ensured with encryption, which is not covered by this document.

Care to enlighten ignorant me why we would want that?

Simple, to help prevent “bit rot”. The problem is exacerbated further in that many of us treat cloud sync services as backup, which they arguably aren’t - they can inconveniently just spread the decay.

I’d also hoped that a next generation file system from Apple would have had more to say on this topic, but it seems like features that promote their iOS device agenda took front seat over less “sexy” features like data integrity.

In the days before iOS devices dominated OS level decision making at Apple there was an assumption that Apple might adopt ZFS as their next generation file system, which is apparently much better in this regard. There’s various evidence of a cancelled MacOS ZFS project scattered throughout past MacOS releases.

> https://en.wikipedia.org/wiki/Data_degradation

> https://arstechnica.com/gadgets/2016/06/zfs-the-other-new-ap...

Word on the street is that Apple's ZFS integration was mostly finished and it was going to be announced at WWDC. Sun opensourced ZFS under the CDDL. But then Oracle bought Sun, and Apple's lawyers wanted to make sure Oracle wouldn't try to sue them over ZFS somehow anyway. Negotiations between Apple and Oracle for a clear ZFS license fell through. Without legal go-ahead the feature was pulled from macos at the 11th hour and buried.

When ZFS was opensourced under the CDDL, lots of people complained that they should have chosen a clearer, more permissive opensource license. Other people said it was fine, because the license was good enough and Sun is full of good people. The way everything played out, its clear the first group's concerns were valid.

Its a huge shame. ZFS is a fantastic piece of engineering. It was ahead of its time in lots of ways. It would take years for btrfs to become usable and for apfs to appear on the scene. If not for the weird licensing decision, zfs would almost certainly have landed in the linux and macos kernels. We almost had an ubiquitous, standard, cross platform filesystem.

For more history about Sun and Oracle, this talk by Bryan Cantrill is a great watch: https://www.youtube.com/watch?v=-zRN7XLCRhc

Might it not be the case that combating bit rot is best done at higher layers of the stack similar to how it is best done at levels higher than the IP layer in a networking stack?

For example, data painstakingly entered by the user a character at a time with a keyboard might deserve more redundancy than for example a movie downloaded by iTunes.

I’m no expert, but my understanding is that bits can “flip” and introduce errors due to things as unpredictable as background radiation etc, even on files the system has had no interaction with, which is why it’s kind of desirable to implement this kind of integrity check at the file system level. A higher level check may be completely unaware of this kind of passive background error.

Also, if I pull the drive and move it to another machine, again it’s kind of nice if the data integrity features are tied to the drive format rather than higher level software. I don’t think it’s too unreasonable to expect the file system to make sensible guarantees that the sequence of bytes I record today will remain the same until I next interact with them.

I’m not sure how appropriate comparisons with IP error correction is either; it’s a markedly different class of problem really (you are not dealing with long term storage issues at all).

Not OP, but probably to make sure the contents of a file are not changed by hardware errors.

Yep, otherwise you aren't able to detect bit level errors unless they impact the metadata, and the metadata is a tiny fraction of your total storage.

That said, your hard drive already does block level checksumming so doing it at the FS layer is mostly redundant unless the errors are being introduced in your SATA controller or on the PCI bus.

You would still need end-to-end integrity checking, unless your Mac came with ECC memory (which it probably didn't).

Memory errors are still a concern, however, RAM is not used for persistent storage.

If a bit flip occurs during the path to storing data, that could get persisted. That's a moment in time, though. Maybe you'll notice the document you just wrote seems corrupted, or just has a typo.

But if you write successfully to disk, you are trusting that data to stay there long-term. If years later your drive corrupts a bit, you may have a very hard time noticing. Bad RAM manifests as computer instability and you can just replace RAM without data loss, as nobody is permanently storing data in RAM

Because the data spends so much longer on disk than in RAM, the chance of a bit flip affecting stored data.

It takes bad luck for sure, but I once ruined a bunch (a big bunch) of my photos by syncing them to a NAS with a faulty RAM. It was a Synology Ds212 I think, back in 2012. Mind you, the device didn’t produce symptoms other than messing up regularly spaced bytes in the transferred files.

I am super paranoid about this kind of stuff and don’t consider a copy finished until it is first done copying then also passes an independent rsync -c.

For my family photo's, I create par2 files. The rest, I don't care so much.

I've recently listened through an old-but-good episode of the Hypercritical podcast with John Siracusa's informative rant about this very topic: http://5by5.tv/hypercritical/56

I've just been going through Siracusa's old OS/X reviews, he does talk about this a lot right back to the earliest days.

The EFI jumpstart is particularly clever. A straightforward recipe for locating and verifying the file system driver, and then once executed the UEFI pre-boot environment can fully navigate an APFS volume.

Uhhh, personally I'd prefer that UEFI stay away from the OS particulars. But anyway, afaik Windows' boot code had about the same feature―at least it certainly did in regard to the chipset drivers, the result being IIRC that the OS wouldn't boot if you moved the partitions a bit.

The pre-boot environment needs to find the kernel and initramfs somehow. My guess for how Apple is booting from APFS, now that it's all APFS, without a separate recovery partition? They've got this minimalist EFI jumpstart code in the firmware, it loads the EFI file system driver for APFS, and now it can locate the bootloader, kernel, and kext cache.

For a long time Apple has had an HFS+ driver baked into the firmware. The way APFS is implemented with EFI jumpstart, they've got much less filesystem code in firmware.

It's nice to see that this is finally up; I know a lot of people have been clamoring for a more detailed reference for a while and this should hopefully make it easier for them to interact with APFS.


Should be San Francisco.

Every time I see the San Francisco font mentioned anywhere, I don't think of Apple's rather lovely neo-grotesque, but rather Apple's original San Francisco. (Which is also lovely, in an oh cool, this 1984 era computer has a variety of fonts kind of way.)


And San Francisco Mono for the monospaced font.

Good find. Wonder if there's as much doc of ZFS/OpenZFS.

Indeed there is: http://www.giis.co.in/Zfs_ondiskformat.pdf

It glosses over and assumes knowledge of XDR from an external source. That is documented here: https://tools.ietf.org/html/rfc1014.html

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact