Hacker News new | past | comments | ask | show | jobs | submit login
Apple File System (developer.apple.com)
1094 points by sirn on June 13, 2016 | hide | past | favorite | 393 comments

> Flash / SSD Optimization

I.e., a "unique copy-on-write design"

> Space Sharing

Basically, ZFS datasets.

> Snapshots

If those can be sent: Finally Time Machine done right.

> The AFP protocol is deprecated and cannot be used to share APFS formatted volumes.


> An open source implementation is not available at this time. Apple plans to document and publish the APFS volume format when Apple File System is released in 2017.


Limitations: https://developer.apple.com/library/prerelease/content/docum...


> encryption models for each volume in a container: no encryption, single-key encryption, or multi-key encryption with per-file keys for file data and a separate key for sensitive metadata

Nice. I hope they also include checksums for each block.

Famously missing, but not the hardest thing to add considering all the features above: Compression (which HFS+ supports!)

> > Snapshots

> If those can be sent: Finally Time Machine done right.

If this ends up being true I can't wait—it's so frustrating watching tiny incremental backups take forever over the network. It seems like "Preparing" and "Cleaning up" take longer than moving the data.

it's so frustrating watching tiny incremental backups take forever over the network

While waiting for APFS to become stable, buy Carbon Copy Cloner. $40. I love it. It's fundamentally rsync, but tailored to OS X. For personal use a single license covers an entire household.

Every night CCC fires up on each laptop and each does an incremental clone to its own dedicated directory on my desktop machine. This clone is usually less than a few GB and takes about a minute to run. CCC doesn't have to be run daily, it can be told to run hourly instead.

Time Machine running on my desktop then copies everything off to yet another disk.

So my backup environment is:

   each laptop uses CCC to periodically clone to desktop

   the clones on the desktop are "traditional",
   CCC is told not to keep its own copies of modified files

   desktop runs time machine,
   makes hourly backups to a TM disk,
   old versions of files can be found there
This gives me 3 copies of all data on each laptop:

1) on the laptop

2) on the desktop

3) on the desktop's Time Machine Volume

If you create your users in the same order on each machine, or later go back and change the User IDs to match (under advanced options in Users and Groups) then the uids will match everywhere, and all files can be easily browsed in each location, with all file permissions intact.

18 years ago, while being a teenager, I misconfigured CCC only to find out that only the active user folders where synced after a full erase/reinstall upgrade of MacOS.

This was my epic "there is to kind of people those who lost data and those who will loose data" story. All my dad and mum files where gone, luckily nothing professional and no pictures (analog camera still ruled back then).

This is just a reminder that no tool is ever perfect. CCC is great, TimeMachine is sluggish but "dumbproof" to a certain extend (but you have to keep faith in a black box).

Personally I'm pretty paranoid so I perform my monthly off-site backup by rebooting my Mac on recovery partition and I perform a full disc copy to an external drive using DiskUtils (because ultimately this will be the recovery tools I'll use if things goes bad so I need to use my off-site backup...).

It really sluggish so I run it while sleeping. I get that for professionals with more than 1To you will need to get to some more serious stuff anyway.

I appreciate this was a long time ago and you've learned a lot since, but your story perfectly illustrates the importance of not just performing backups, but periodically testing them as well. That way you eliminate your blind faith in a black box by ensuring you actually have a working solution to fallback on.

I my case (and I can't emphasis more on my teenager status back then) it was a permission issue, everything looked fine from my perspective, but CCC somewhat didn't have the right permissions to sync folder of others users. So their home folder where actually empty.

From my perspective the backup was a success... This traumatizing error is when I first learned about superuser and permissions. I guess CCC has gone a long way since then and added more safeguard. But I must confess that I've sticked to a "use the goddam standard tools" since then.

Sure there must be a lot of more customizable or faster tools out there. But Time Machine stay "the backup tool my mum can't use wrong" I can't give enough to credit to Apple for that. And putting aside NSA/FBI problem, iCloud backup for iPhone is exactly as smooth. Literally every time I go to a Genius Bar, someone is about to hug a genius because of a successful restore from the iCloud of their broken/stolen iPhone.

This is how you build a "faithful" customer base and no other tech company understand that better than Apple so far.

Those points don't change the fact that you (and everyone else) should test back ups periodically.

If it's not file system permissions nor other configuration problems then it will be a failing storage medium or just dumb user error. So the only way to be sure you have a working back up is to test that back up - ideally before you actually need to use it.

To be clear I can't agree more!

I got a little lost while writing. But indeed my point was double check! Because in my case everything looked fine at first sight.

This is even why I don't use the same process for my two backup (daily/monthly). Because to be fair you can't constantly check that your daily/hourly backup isn't corrupted, you rely on its build-in safeguard and occasionally you check it.

But by using two different methods for daily vs monthly(offsite) backup, you significantly reduce the odds that the two different methods failed at the same time while you need them. (Also a monthly full clone is way easy to check than a Time Machine Disk)

Yeah, once upon a time at a startup, I wanted a file restored. We were small; the CEO was actually the one doing the backups (to QIC tape cartridges).

Of course, the recovery failed. In fact, the backups hadn't been working properly for months. I lost a few hours of work; if our file server had actually failed, not having a good backup could have literally been catastrophic. As in would the company have survived?!

I recall an ancient quip, more or less: "if you don't test your backups, you don't have backups, you have dreams".

I keep hearing this, but as a Mac user who uses Time Machine and CCC, what's a good way to test the backups? After all deleting my only system to attempt a restore seems even more dangerous.

what's a good way to test the backups?

Here's a way to do it. You need to feel comfortable around the OS X terminal command line. You only rely on OS X's built in programs, so it's an independent way to check if your backup program is doing the right thing.

As root, I used to do something like this:

   cd /
   find -x \
      . \
      -type f -print0 \
      | xargs -0 -n 100 -x md5 \
      | sort > /tmp/src.md5
Then I would run the same type of script in the destination directory and diff the results.

There are some annoyances. E.g. (from memory) the ~/Library/Caches files aren't backed up, so they will be missing at the destination. I wrote some sed commands to first remove some of these from the result checksums to keep my diff's more manageable.

Instead of the above, I currently use a Python script that I wrote, that let me fine tune things. At the heart of it is (as root, of course) using Python's os.walk to traverse a directory tree. For each file I use hashlib.sha256 to generate a checksum. I also don't descend into certain directories. Etc.

Using a few Python scripts to generate and process the source and destination checksums allows me to, at the end, use a simple

   vimdiff source.checksums destination.checksums
to quickly see the few differences that remain.

Keep in mind that the perfect is the enemy of the "good enough". Just using the basic 'find' (and not a custom Python script) is plenty good. I used that method for many years.

Many errors stick out right away. E.g. if you have 1,000,000 files checksummed in your source but only 250,000 in your destination, then you quickly know that you screwed up.

Buy a new hard drive and restore to that?

Not trying to troll, but I see so many HN comments about backups and I just don't have this need anymore. What are people using traditional backup software like time machine, carbon copy cloner, etc. for on their laptop these days?

I use google docs for all my docs and spreadsheets, occasionally I use excel or word or keynote for files but if I do I save the docs to my dropbox or google drive folder, I have my photos synced with google photos, my music is synced through itunes match (or I can sync my music library files to dropbox, or use spotify), all my code is in git repos pushed to github or bitbucket.

What else is there to back up? Is it a matter of not wanting to re-install apps manually to get your computer back to its current state? Or is it a matter of not wanting to pay for the extra storage space on dropbox?

- Some people don't want to use the cloud to store their stuff for privacy / ethic / commercial / legal reasons.

- Some others have been burnt by the cloud failing them (corruption, data loss, copyright abuse, etc) and want an additional layer of security.

- There are many activities that are fully not covered by the cloud, such as 3D / video / music edition, art in general, programming (all those dev envs)...

- You may want / need to be able to be able to recover from a dataloss even if you are offline.

- Some file types don't match the cloud sync paradigm very well such as system and configurations, non online game saves and all stuff that needs to be at a precise place on the system.

- You got stuff you want to manipulate as files in dirs, not as an entry into app. Power users usually dislike the loss of control and freedom the apps imply.

- You like to have 3 backup because of the rule "one on site, on off site".

- You like to have all your backups at the same place.

- You have collections off files that just don't fit in the cloud, such as Tera of Videos.

- Sex tapes still need to be backed up, and you won't send that to your dropbox.

- Some people still have a shitty internet connections and can't sync reliably.

- It's less work to manage one backup than to setup all the parameters of all those apps to be sure they sync only what you want.

- You read the licence for some sync plateform and couldn't decently click "I accept".

- You got a NAS at home on which you plug your hard drive with the backups for the whole familly.

What's wrong with sex tapes in the cloud?

Ask the celebrities who were the targets of "fappening"

That's why you encrypt sensitive stuff before saving it on dropbox.

I'm glad I had cloudtobutt enabled for your comment.

It's not only paranoid people who keep offline backups, it's those that understand the threat models where data can be destroyed through logical or physical access to all storage locations.

For arguments sake, I'll focus Google services. As I personally discovered recently, docs offers minimal protection for your data:

    1. Docs shared with you (others are "owner") can disappear without notice.
    2. Manual clones are the only way to keep a copy of shared documents.
    3. The GDrive agent keeps no local copies of docs, only urls.
    4. Any deletions made >25 days ago are unrecoverable.
    5. Deleted accounts have only a 5 day undelete window.
When you combine these it means a document you've contributed to may quickly disappear forever as the result of mis-actions made by someone else or their employer/educational institution. You may have assumed "all changes saved to drive" meant into your account, but you'd be wrong.

Similarly removal of items from trash in GMail/GPhotos is permanent. If someone maliciously gained access to your Google account (or one of your devices) they could quickly and easily purge your data. These deletions would quickly and efficiently propagate to all your devices and purge the canonical "cloud" copy.

The accidental permanent deletion can also occur very easily if you use the GDrive native windows sync tool.

Basically if you don't have any offline backups it is easy to become totally screwed. 2TB drives are cheap.

The scenario you describe involves no true "backups". It's probably better to say that you only have original copies, albeit resilient to failure of local hardware. Cloud services have data-loss events, due to both technical and business reasons. They're vulnerable to a variety of SPOFs out of your control or visibility.

I've known so many people over the decades who trusted some online provider to reliably store the only copy of critical data, only to be burned. Even providers that you would have thought should be bulletproof.

My rule of thumb, based on data loss studies, is to have three copies of any critical data. Any cloud provider should only be considered to be one copy.

I have an upload speed of 50 kbyte/s at home. I'm not putting my 2TB NAS data into The Cloud™ (or even just my personal root server) any time soon.

I agree cloud based backups like Dropbox just sync in the background. Got any souce code store it in Github or Bitbucket.

Documents and source code doesn't take up much space. Unless you want music and video files which iTunes can always sync for you to your Apple mobile device.

I use a USB hard drive to back up my Thunderbird and Firefox profiles. Also my downloads and other stuff too big for Dropbox. $100 for a 4T USB 3.0 hard drive is cheap.

Sync is not backup. If you corrupt or delete a file, the destructive action will get propagated as well. Time machine and many other backup systems gives you access to previous versions of files and folders.

CCC was what I used back in the day (IIRC it used to be free) to make the most reliable bit-for-bit backups of OS X machines. Time Machine skipping over so many files is kind of a nuisance to me.

Unless you set the privacy in Time Machine to ignore certain drives/volumes/folders, you shouldn't really miss anything with a Time Machine backup. It usually ignores those files that aren't necessary and can be created by the OS or applications (like caches or temporary files, for example). What kinds of personal files did you lose with a Time Machine backup?

Of course you can see the full list at /System/Library/CoreServices/backupd.bundle/Contents/Resources/StdExclusions.plist.

One of the biggest is iPhone backups from iTunes. If I were a normal user and my MacBook didn't boot up tomorrow, I'd be surprised to find my backup didn't include critical files like this. For example, apps that are no longer in the App Store but are in your backup can be restored. Once you don't have that backup anymore, they can't.

It's just one of the reasons Apple users are forced to have multiple backup services if they want reliable and complete backups.

I'm not sure why the iPhone backups on your system didn't get backed up. The device backups stored in ~/Library/Application Support/MobileSync/Backup/ is not in the exclusions file that I see.

P.S.: The latest app binary files may not get updated/synced on the Mac post iOS 9/iTunes upgrade.

I double checked my drive and you're correct with regard to mobile backups. That's reassuring.

It would be nice to have a utility to give a definitive diff of what's on your hard drive that's not on your: Time Machine drive / [Carbonite, Backblaze, etc] backup.

I've migrated many Macs using Carbon Copy Cloner. Works fabulously and is a great tool.

> The AFP protocol is deprecated and cannot be used to share APFS formatted volumes. Interesting.

That is really interesting. I recognised with El Capitan that it already defaults to SMB instead of AFP when you don't give a scheme.

AFP is also a security disaster. Check out the spec for "DHX2" https://developer.apple.com/library/mac/documentation/Networ...

Can you provide some details about what's wrong with it? (Not a security person, no idea how to evaluate such a claim)

AFP depends on having persistent, globally addressable IDs for files (CNIDs). Perhaps this is no longer available under APFS?

While I imagine this might be retrofittable (netatalk manages it, after all), I can't blame Apple for wanting to ditch AFP. It's an ancient protocol at this point, and I'm frankly just amazed it still works at all.

I'm very sad about the lack of AFP. It's so much easier to get going in linux than Samba.

> Famously missing, but not the hardest thing to add considering all the features above: Compression (which HFS+ supports!)

HFS+ supports compression, so compression is not a new feature, so it's not on the New Features page. No problem!

I can live with that! Be sure to make it LZ4!

What will happen to Spotlight search on network shares? At the moment, they are based on AFP.

Maybe even an end to the constant self deletion of the backups due to "inconsistencies".

I found forcing user to SMB weird given how crap their driver is...

Crap driver? It's not a 'driver', and smb has been natively supported for at least 8-10 years. It hasn't sucked in a looong time either. In fact recent OS versions have defaulted to smb v2 for file sharing.

I believe when they flipped the switch to default to SMB it was actually to SMB 3 in 10.10.

Anecdotally, I'd always found AFP in the Tiger and Leopard days to be faster than whichever version of SMB support was included at the time. Now I use the default SMB3 and it seems that 802.11ac and gigabit are bottlenecks (of course its 10 years later in the times of SSD's as well)

Apple deprecated AFP and switched to SMB2 by default in Mavericks/10.9 (https://www.apple.com/media/us/osx/2013/docs/OSX_Mavericks_C...). AFP was used only for Time Machines and connections to older Macs.

And it wasn't just anecdotally faster; I worked for a storage company specializing in Mac workflows and AFP was empirically several times faster, especially on 1GbE and 10GbE networks. This was in part due to Apple ditching Samba in Lion/10.7 (http://appleinsider.com/articles/11/03/23/inside_mac_os_x_10...) over GPL concerns and replaced it with their own shitty, incomplete implementation, which they didn't get up to Samba's standards until 10.10.

I remember this being excruciating since customers had to either buy a third-party SMB implementation like DAVE to get any value out of 10GbE connections, or hope that the applications they wanted to use over the network supported AFP.

The spartan description of APFS certainly sounds like the (partial) feature list for ZFS--the comparisons made in the comments here are on-point. ZFS though took around 5 years to ship and, arguably, another 5-10 to get right. I say this having shipped multiple products based on ZFS, writing code in ZFS, and diagnosing production problems with it.

On-disk consistency ("crash protection"), snapshots, encryption, and transactional interfaces ("atomic safe-save") will no doubt be incredibly valuable. I don't think though though that APFS will dramatically improve upon the time it took ZFS to mature from a first product to world-class storage.

Some commenters have opined that (despite Apple distributing ZFS for Mac OS X at WWDC nearly a decade ago) that ZFS would never be appropriate for the desktop, phone, or watch. True ZFS was designed for servers and storage servers, but I don't think there's anything that makes it innately untenable in those environments--even its default, but not essential, use of lots of RAM.

Who knows... maybe Apple have spent the decade since killing their internal ZFS port taking this new filesystem though the paces. Its level of completeness though would suggest otherwise.

Between the ZFS-like features and the advertised "novel copy-on-write metadata" scheme, I would not be surprised if it was partially based on DragonFly's HAMMER/HAMMER2 filesystem.

If you poke around in the the APFS kernel extension, it's not a very big binary, given the feature set. (It's a 550K extension, compared to the 2.5MB zfs.ko on FreeBSD.) I haven't disassembled it yet, but I'd wager that APFS is layered atop Core Storage btrees/CoW. Since that code's been shipping since 2011, maybe APFS stabilizes faster than a true greenfield filesystem?

(That does leave me wondering how interesting an open-source APFS would be without an open-source Core Storage.)

> It's a 550K extension, compared to the 2.5MB zfs.ko on FreeBSD

hammer.ko is 494KB in the latest DFBSD, for those wondering the obvious.

I was thinking the exact same thing!

It would be fantastic if they actively helped with HAMMER2 development.

It would be a strange kind of NIH that excluded ZFS but welcomed HAMMER...

different licensing and HAMMER demands significantly less memory


Yes. I was a heavy user of ZFS on Mac OS X back then. Was on the developer mailing list for that implentation, submitting bugs and talking to Apple developers working like mad to ship it.

Then Steve Jobs' buddy Larry bought Sun, and licensing of ZFS became basically impossible to sort out on time. So they dropped it.

Was that 2007 or 2009?

2009. I remember ZFS was coming in Snow Leopard. Then it wasn't



The Sun acquisition was announced in 2009 and closed in 2010. It may have been related, but I have my own suspicions about about what caused Apple to jettison their port.

Given the mountain of legal shit Google has been subjected to over Java, the last thing Apple wanted was to have to answer to Larry's lawyers. ZFS, awesome as it is, has been poisoned.

It's like no one here remembers the NetApp lawsuit.

Do tell.

The thing that killed ZFS on the Mac was the NetApp lawsuit over it against Sun.

It's impossible to know how long they have been working on it. As I recall, Swift had been in development for 3+ years before it saw the light of day.

Yeah, I bet it's been in development for a while. People have been seriously bitching about HFS+ for at least a decade, and with good reason. Apple has been pretty much silent about it apart from briefly testing ZFS and I'd wager a guess that they started working on it shortly after they abandoned that, while they continued also bolting more stuff onto HFS+ in the meantime.

Unless I'm misreading, the page seems to indicate that only external volumes can be formatted with APFS. My guess is that they'll have limitations and HFS-defaults for several years, at least on macOS, as the file system matures.

Either way, what file system didn't take years to get right? There are so many possible edge cases with file systems that it not only takes a long time to sort them out, but an amazing community and/or luck to reproduce deterministically to fix them.

File systems take A LONG time to stabilize, and you want them stabilized well, because if there is a bug in httpd, you just restart it, if there is a bug in filesystem you loose your data. See BTRFS. Started in 2007 and it is still not stable enough to be supported in RHEL (and for good reasons). So, of course, Apple engineers are way better than the rest of the world and they have magical pixie dust which makes bugs disappear, so they will fix bugs way faster than anybody else. Not.

It's good to see Apple catching up, no matter where it's coming from. You mention ZFS, but per-file encryption (EFS) and snapshots (VSS) in particular stood out to me as features NTFS has had for a decade.

Anyone know who works on APFS? If I were Apple I would have certainly picked up some of the ZFS core team, curious if any of them are currently at Apple.

Could speed up their time to market with such seasoned hands on board.

One of them worked on the 'Be File System'... that's interesting https://en.wikipedia.org/wiki/Dominic_Giampaolo

It seems ZFS was a no-go because of its license. But why does Apple develop its APFS rather than port the open source BTRFS with all its features? NIH syndrome?

Probably still because of the license? Btrfs is GPL.

It will be interesting to see which license, if any, APFS is released under.

Siracusa will finally be happy. ;-)

Sounds an awful like like ZFS (zero-cost clones, read-only snapshots) but could it be? I would imagine they'd start from scratch to due IP issues.

Clearly this is immature technology they want to get out for testing/evaluation before it's fully adopted even into their own products. (See below.)

- - - from the release notes - - -

As a developer preview of this technology, there are currently several limitations:

Startup Disk: APFS volumes cannot currently be used as a startup disk.

Case Sensitivity: Filenames are currently case-sensitive only.

Time Machine: Time Machine backups are not currently supported.

FileVault: APFS volumes cannot currently be encrypted using FileVault.

Fusion Drive: Fusion Drives cannot currently use APFS.

Another reason is that ZFS just isn't fitting for Apple devices. It's memory hungry and energy hungry and has several limitations compared to HFS+ like not being able to be resized.

ZFS will be happy on a system with only 512MB of RAM, no matter how much storage it manages (assuming only 1 pool). It does need more RAM than UFS, but the amount is not notable unless we are talking about systems with 32MB of RAM.

Being energy hungry relative to UFS and others is likely true due to things like checksum calculations and compression, but there is no way to implement these things without needing more cycles to compute them.

> Being energy hungry relative to UFS and others is likely true due to things like checksum calculations and compression, but there is no way to implement these things without needing more cycles to compute them.

Not so true now - people have added encryption and compression instructions to CPUs. I'd be surprised if Apple couldn't ask Intel for a couple opcodes, and with the mobile platforms they do it anyway.

But why couldn't ZFS also take advantage of those opcodes?

It is platform dependent. ZFS does not do that on Linux yet in part because of GPL symbol restrictions and the fact that there are other things to develop right now, although there has been some work done in this area to use the instructions directly. It definitely takes advantage of them on Illumos. I am not sure about the other platforms.

xnu's loadable kext isolation means that you have a one-time hit on using anything beyond x86-64+sse2, which can be paid at kernel prelink time, kext load time, or while a kext is running, via a trap that switches call preamble/postamble to handle the extra state (and which facilitates selecting fast paths on a cpu-by-cpu basis, for example). Only the presence of x87 insns impose noticeable cost.

o3x builds and runs just fine with -O2 -march=native and the latest clang just by changing CC and CFLAGS; the kexts that get built aren't backwards compatible though (you'll get a panic if you build with -march=native on a machine that does AVX and run on a machine that doesn't).

The code that recent clang+llvm generates makes heavy use of the XMM and YMM registers, and does some substantial vectorization. The compression and checksumming and galois field code that's generated is strikingly better, although not quite as good as the hand tuned code in e.g. (https://github.com/zfsonlinux/zfs/pull/4439). It may be interesting to compare performance, but given that compression=lz4 and checksum=edonr has negligible CPU impact on a late 2012 4-core mac mini (core i7) even when doing enormous I/O (> 200k IOPS to a pair of Samsung 850 PROs), hand tuning likely won't make as much of a difference as moving up from compression=on, checksum=[sha256|fletcher4].

I'm pretty sure that once the hand tuned stuff is in ZOL it'll get looked at by lundman for possible integration.

I'd be surprised if it doesn't. AVX is really good at speeding up compression/checksumming algorithms, and AESNI is standard in most AES implementations nowadays.

Because not every opcode is made public. The "usual suspects" for SIMD and encryption are public, yes, but nothing stops Intel from adding opcodes so highly specialized that they essentially represent the exact program code of the filesystem.

ZFS gobbles RAM and almost certainly couldn't be made to run acceptably on the Apple Watch. No, this seems like something developed from scratch to meet their particular needs.

ZFS needs very little memory to run. Performance is definitely better with more RAM, but the overwhelming use of memory in ZFS is for cache. Eviction is not particularly efficient due to the cache being allocated from the SLAB allocator, but that is changing later this year.

Getting ZFS to run on the Apple Watch is definitely possible. I am not sure what acceptably means here. It is an ambiguous term.

The FreeBSD wiki (https://wiki.freebsd.org/ZFSTuningGuide) still claims:

> To use ZFS, at least 1 GB of memory is recommended (for all architectures) but more is helpful as ZFS needs lots of memory.

Is that inaccurate?

I've ran the initial FreeBSD patchsets for ZFS support on a dual-p3 with 1.5g ram, so 1g for a recent version should be more than doable. ZFS on FreeBSD has become a lot better with low-memory situations.

There are two additional things to consider.

ZFS uses RAM mostly for aggressive caching to cover over both spinning disks and the iops tradeoff vdevs make over traditional raid arrays. Thus low memory is not such a big deal if you have a pool with a single SSD or NVMe device.

The other point to consider is that on at least any non-Solaris derived platforms, the VFS layer does not speak ARC. So data is copied from an ARC object into a VFS object, taking up space in both. If you are able to adopt your platform to use the ARC as direct VFS cache, you can save RAM that way as well.

The wiki page should be corrected. Saying "lots of memory" is somewhat ambiguous. If this were the 90s, then it would be right.

As for the recommended amount of system memory, recommended amounts are not the minimum amount which code requires to run. It in no way contradicts my point that the code itself does not need so much RAM to operate. However, it will perform better with more until your entire working set is in cache. At that point, more RAM offers no benefit. It is the same with any filesystem.

because it has its own FS cache.

ZFS de-duplication will eat all of the RAM you can throw at it.

Otherwise, it is basically the SLUB memory block allocator that was used in the Linux kernel for a while. So yes, it can run on watchOS-level amount of RAM.

ZFS data deduplication does not require much more ram than the non-deduplicated case. Performance will depend heavily on IOPS when the DDT entries are not in cache, but the system will run slowly even with miniscule RAM.

In my experience, ZFS will choke out the rest of the system RAM, and swap like crazy, killing the system.

Kernel memory on the platforms where ZFS runs is not subject to swap, so something else happened on that system. The code itself is currently somewhat bad at freeing memory efficiently due to the use of SLAB allocation. A single long lived object in each slab will keep it from being freed. That will change later this year with the ABD work that will switch ZFS from slab-based buffers to lists of pages.

If dedup is off, and max ARC size is limited, it will use a little memory (e.g. 512 Mb of RAM for 2x2TB RAID1 pool). I can say that from my own experience, I tried both approaches.

I probably should clarify that the system could definitely run unacceptably slow when deduplication is used and memory is not sufficient for the DDT to be cached. My point is that saying ram is needed is saying that the software will not run at all, which is not true here.

Or when the cache is cold. It REALLY hurts to reboot while a deferred destroy on a big deduplicated snapshot is in progress. No import today for you!

Well, unless your medium has no seek penalty, which is what hurts with deduplication. Dedup on SSDs is pretty much OK, as long as your checksum performs reasonably (skein is reasonable; sha256 is not).

DDTs that fit inside no-seek-penalty L2s don't hurt that much either, and big DDTs on spinny-disk pools are acceptable with persistent l2arc, although it's risky because if the l2 fails, especially at import, you can have a big highly deduplicated pool that isn't technically broken but is fundamentally useless if not outright harmful to the system it's imported (or ESPECIALLY attempting to be imported) by. "No returns from zpool(1) or zfs(1) commands for you today!"

When eventually openzfs can pin datasets and DDTs to specific vdevs (notably ones made out of no-seek-penalty devices), heavy deduplication on big spinny disk pools should be usable and reliable.

Until then, "well technically even if you have only ARC and it's very small, it will work, just slowly" while correct in the normal case, is unfortunately hiding some of the most frustrating downsides when things go wrong.

Interesting, because on Reddit's /r/DataHoarder they recommend a "1GB RAM per terabyte of storage" rule of thumb. [standard Reddit disclaimer]

That's if you're running deduplication (which is generally considered pointless for general purposes, it works very well for some data-loads but you really need to bench it beforehand considering its cost)

> Interesting, because on Reddit's /r/DataHoarder they recommend a "1GB RAM per terabyte of storage" rule of thumb.

The author meant deduplication, but that recommendation is wrong. A rule of the form "X amount of RAM per Y amount of storage" that applies to ZFS data deduplication is a mathematical impossibility.

You could could need as little as 40MB of RAM per TB of unique data stored (16MB records) or as much as 160GB of RAM per TB of unique data stored (4KB records), both assuming default arc settings. Notice that I say unique data and discuss records rather than simply say data. There is a difference between the two. If you want to deduplicate data and want to maintain a certain level of performance, you will want to make sure RAM is sufficient to have a relatively high hit rate on the DDT. You can read about how to do that in my other post:


It is not stragihtforward and it depends on knowing things about your data that you probably do not. There is no magic bullet that will make data deduplication work well in every workload or make deduplication easy to calculate. However, if the data is already on ZFS, the zdb tool has a function that can figure out what the deduplication ratio is, provided sufficient RAM for the DDT, which makes it impractical to run it on a large pool relative to system memory.

ZFS' data deduplication is a very strict implementation that attempts to deduplicate everything subject to it against everything else subject to it and do so under the protection of a merkle tree. If you want it to do better, you will have to either give up strong data integrity or implement a probabilistic deduplication algorithm that misses cases. Neither of which are likely to become options in ZFS.

Anyway, deduplicating writes in ZFS is IOPS intensive, which is the origin of poor performance. There are 3 random seeks that must be done per deduplicated write IO. If the DDT is accessed often, it will find its way into cache and if all of those seeks are in cache, then your write performance will be good. If they are not in cache, you often end up hitting hardware IOPS limits on mechanical storage and even solid state storage. That is when performance drops.

If you are writing 128KB records on a deduplicated dataset on hardware limited to 150 IOPS, you are only going to manage 6.4MB/sec when you have all cache misses. If your records are 4KB in size, you will only manage 200KB/sec when you have all cache misses. However, ZFS will continue to operate even if every DDT lookup is a cache miss and you are hitting the hardware IOPS limit.

The 1Gb per Tb rule of thumb is to handle the larger requirements of ARC caching, not the FS itself.

It's a performance guide, not a requirement.

The Gb/TB rule of thumb exists so that on-disk files can be moved around or stored in RAM before a write operation, and that open or recent files can be precached in RAM while being streamed, more than it is relative to the pools total capacity.

For things like compression, hashing, encryption, block defragmenting and other operations, ZFS uses a lot of caching and indexing to avoid bottlenecks.

If Apple decides to implement APFS on RAID at a software level ie to combat bitrot or to sell consumer /business NAS / SAN ie upscaled Time Machine services for VMs, there's going to be questionable setups and comparisons to FreeNAS, synology, unRAID and other software storage options with a mixture of technology and hardware, where apple won't be flexible or adaptive.

To argue for ZFS, requires understanding more about ZFS usage and performance scenarios.

It is very possible to run ZFS RAID Z1 on 2gb or less for even a 32tb pool, ie anyone is able to run 5x seagate 8tb SMR archive drives in RAID Z1, on 2gb RAM.

It is usable. FreeNAS regularly hosts builds on less than optimal hardware setups,

It's also usable with 4gb, 8gb, 16gb, or 32gb RAM, with varying % performance benefits as features are enabled and cache is expanded to handle storage of ARC or LRU (recently used) files/pages/blocks.

Usually, ZFS metric is measured in throughput when empty, to 90% full, and performance changes drastically under these conditions when cache is limited.

On a system like this with SMR "archive" drives the problem often is having a reliable cache of write data, and ideally, less fragments to store asynchronously, ie writing large files or modifying a large block is disk IO limited. If being used to store archives, up to and including for media files as a consumer device would, an optimal RAM size would be hard to guess, given that people might store bluray or UHD ISO files of ~40gb versus DVD's of 4-9gb, and streaming read/write of linear files would not use significant random iops.

With DB or VM storage, and consistent file blocks being written, the use case and performance requirements are just going to be different again, and this is where the 1gb per Tb rule is both useful and unhelpful for diagnosis of requirements.

ZFS has a lot of bottlenecks, usually CPU, RAM and IOPS, but people focus on RAM, since it is so much harder to expand or scale. And, it is not linear scale performance.

Regardless, it's just impossible to guess optimal use in a practical way since there's almost no caching at all under 2gb, the ARC is very limited and kernel panics are possible when memory is not tuned or limited to avoid expansion, which then usually relies on CPU performance rather than disk performance.

At the high end of usage, performance can be managed by different methods such as L2ARC, ZIL, more RAM, more CPU, different pools, etc. Each with caveats and usually, non linear benefits.

Many NAS units that come with 2gb of RAM are capable of running ZFS, the problem is performance.

It's even possible to run ZFS on less than 1gb RAM, but it's not going to be reliable or predictable unless you restrict the conditions of usage, ie limiting max filesizes, restrictions on vdevs or iops, etc. It would require heavy tuning for optimal task usage.

Especially if you start to hit the maximum storage limits of the pool, performance can be brutal without caching features, lower than 100kb/s when the ARC is busy or unoptimised. Usually whatever the CPU can deliver from the drive IO without IO or file cache will be veeery slow on NAS level hardware, because traditional NAS isn't CPU bound.

Essentially, at the point where you can't start or run performance features, there's no benefit from ZFS or CoW on smaller embed devices unless it is needed.

From memory, and experience, you can use half a gb per Tb of storage on Z1 storage with some caveats and have a usable performance, as long as you keep filesize and IO in mind.

With 4tb or larger drives, Z2 is recommended due to the outcome of a drive failure on the pool integrity, and just the rebuild /resilver times and error probability could allow data to be changed or corrupted during the resilver process.

This is just to combat entropy when reading Tb of data and creating new checksums due to the probabilities involved with magnetic storage. Current and future drive density almost guarantees that errors will occur with entropy and decay of magnetic storage.

With deduplication, it needs to store files with multiple hashes, caches per device, and pool, which conflates sizes (sic). About 5gb per Tb is a good start. in most cases, you would never require dedup as it has an extreme cost and usage case.

> Siracusa will finally be happy.

ATP will just be a concert of dings.

"FileVault: APFS volumes cannot currently be encrypted using FileVault. "

This one is confusing, because this is a logical volume feature. Not sure how or why APFS would ever care that some layer above it is encrypting stuff.

On the one hand, that may be an artificial limitation. If this turns out to have some bug that overwrites the encryption keys, they could of course say "we warned you", but it would not be good PR. Also, if beta developers report intermittent smaller data-losing bugs, they might want to study the affected drives to see what went wrong with it. Not having encryption enabled on them will make that a tiny bit easier.

On the other hand, if the new ability to partition drives with flexible partition sizes includes separate encryption keys per partition, and encryption/decryption is done by the block driver, they may have work to do to keep that block driver informed about what blocks should get encrypted with what key.

Err, per [1]

> APFS supports encryption natively. You can choose one of the following encryption models for each volume in a container: no encryption, single-key encryption, or multi-key encryption with per-file keys for file data and a separate key for sensitive metadata. APFS encryption uses AES-XTS or AES-CBC, depending on hardware. Multi-key encryption ensures the integrity of user data even when its physical security is compromised.

[1] https://developer.apple.com/library/prerelease/content/docum...

I wonder why they went this route.

Most apple volumes are already logical volumes (check diskutil list).

FileVault itself is, right now, implemented as part of corestorage (see diskutil cs for the encryption/decryption commands).

I assume they decided they just wanted to go the entire ZFS route and get rid of core storage in favor of a pool model, but still ...

Could be for performance reasons rather than a technical blocker. That said AES-NI instructions do make encryption pretty fast so it's anyone's guess at this stage.

What would be really neat is if this is because FileVault is going to be filesystem-level when they ship APFS.

It's wishful thinking, I know.

There is going to be some absolutely crazy sex in the Siracusa household this evening.

He's in SF w/ Casey and Marco... :/

Are they not his householders?

> It is optimized for Flash/SSD storage and features strong encryption, copy-on-write metadata, space sharing, cloning for files and directories, snapshots, fast directory sizing, atomic safe-save primitives, and improved file system fundamentals.

\o/ Hallelujah, something modern!

I was excited to until I read the limitations section. Can't be used as the startup disk and doesn't work with time machine.

Hopefully this with change in the next macOs release.

Sure, this is a developer preview. Final release in 2017 will surely make this FS the default.

That's what people were hoping for with ReFS from Microsoft, which released years ago, but that still hasn't happened yet.

btrfs was available in the Linux kernel in 2009, but didn't see production release in the tinkerer-friendly distros until 2012 and in an enterprise distro until 2015. These things take time: filesystems need to be absolutely bulletproof, especially in the consumer space where (unlike Linux) most users will have no idea what to do if something goes wrong. I'd say Microsoft is still on schedule.

Speaking of which, is there any other good FS to use for desktop Linux (on an SSD on ArchLinux) or is Ext4 still the recommended standard?

Just yesterday, I did my first Linux installation with ZFS as the root/boot filesystem (Ubuntu 16.04). This is after using it as the default filesystem on my FreeBSD systems for several years, and being very happy with it.

I've used Btrfs many times since the start, and been burned by dataloss-causing bugs each and every time, so I'm quite cautious about using or recommending it. I still have concerns about its stability and production-readiness. If in doubt, I'd stick with ext4.

Depends on your needs.

Stability, reliability: ext4/XFS

CoW, snapshots, multi-drive FS: ZFS/btrfs

SSD speed, longevity: F2FS

I've had F2FS on an Android tablet for many years. Resurrected it. However I'm running Debian on my laptop and I'm scared to try f2fs on / Because i get warnings about it being not fully supported "yet" i would love to have an SSD optimized FS on Linux. Since AAPL will open source the release version, is it conceivable that AFS could replace ext4 as the default Linux FS?

Do you think Apple will release it with GPL-compatible license?

Most Apple OSS stuff is released under Apache (Swift 2.2 is Apache 2), so probably?

I think it could be mentioned that most of the features regarding CoW and snapshots could be provided by LVM these days.

Without getting stability and reliability correct don't bother with the other features. What good is it if the filesystem handles oodles of drives if none of them have any of the data you put on them?

I use btrfs as my daily driver.

XFS, ZFS, btrfs

Apple is far more willing to switch to more newer technologies than Microsoft is.

... and Microsoft is far more willing than Apple to put effort into backwards compatibility.

I don't know about that. Apple switched processor architectures twice, and both times software written for the old arch ran on the new one. And when they replaced their entire operating system, they not only made it so you could still program against the old API—just recompile and go—they also made it possible to run the old OS inside the new one so you could still run apps that hadn't yet been recompiled.

And before that, when Apple made the 68k -> PPC transition in the mid-90s, they ran the whole system under a crazy, bizarre emulator that allowed for function-level interworking - 68k code could directly call PowerPC code, and vice versa. Early PowerPC systems were in fact running an OS that was mostly composed of 68k code; it wasn't until 1998 or 1999 (around the release of the iMac) that most of the Toolbox was ported to PowerPC.

In the past, nobody did a better job of backwards compatibility than Microsoft.

Lately, Microsoft is showing that they aren't afraid to break things in the name of progress. If W10 is indeed the last version of Windows, maybe that's okay.

But wasn't that as the expense of clarity for new developers? I remember a horrible graduation exam where I had to code in VisualStudio.

The Most harsh part was not coding or UI, it was determining which version of different window API had a remote chance to smoothly work together. (It involved DB drivers and data grids)

Perhaps. But I suspect there's a lot more extending and maintaining existing software than writing new software. For the former, backwards compatibility makes a huge difference.

I'm not saying Microsoft's approach is bad, just pointing out that there's a much higher chance of rapid adoption for this new filesystem.

Also, NTFS being a much better filesystem than HFS+, there was a lot less incentive to switch.

Too bad we still can't use NTFS flash drives on mac.

Although ExFAT is at least somewhat promising.

One of the benefits of being vertically integrated.

OTOH Microsoft has a terrible track record of overpromising and underdelivering their next gen file system. I give Apple the benefit of doubt here. It is worded so that the limitations for the better part clearly sounds related to this being a preview release.

Apple has (relatively, you can replace some harddrives) the most control on hardware, so at least from that perspective it's easier for them.

Control over hardware doesn't really buy you anything here. Just about any hardware can use any filesystem with, in the worst case, the requirement that you have a small boot partition using the legacy filesystem.

Interestingly with SSD storage devices, control of the hardware can help a lot more as it can become possible to categorize, fully explore and if needed, ensure a particular behavior of commands like TRIM. Other filesystems have the unenviable task of running on any random piece of storage you throw at it, including things where the firmware straight up lies, or the hardware delays non-volatility past the point the filesystem assumes (potentially producing data loss in a crash) or similar types of problems.

Anyway. Overall, I think it's safe to say hardware control doesn't make most of filesystem development much simpler or easier. But there's a few interesting places it arguably does!

That doesn't really change anything about the filesystem design. A storage device can fail to write data it claims to have because of damage as well as design defects. When that happens, a reliable filesystem will detect it and a less reliable filesystem will catch on fire.

It also doesn't help to control 100% of the built-in storage if anybody can still plug in $GENERIC_USB_MASS_STORAGE_DEVICE and expect to use the same filesystem.

Many filesystems exist that do not run on a "plain" read/write block device, because storage based on flash is more complicated than the old random-sector-access magnetic hard drives. See for example UBIFS and JFFS2 on Linux.

Having full and direct low-level control of on-board SSDs could very well be advantageous for performance and longevity of the flash on modern macbooks. Things like combining TRIM with low-level wear leveling etc.

Taking advantage of the differences between flash and spinning rust only requires that you know which one you're running on.

Moving the wear leveling code into the OS where the filesystem can see it is an interesting idea but why aren't we doing that for all SSDs and operating systems then?

(raw) flash and spinning rust are fundamentally different, because spinning rust drives provide a READ SECTOR and WRITE SECTOR primitive, while raw flash provides READ SECTOR, ERASE (large) BLOCK, WRITE (small) SECTOR primitives. Stuff like UBIFS do try to move the wear leveling code into the OS. But the big players like Windows' NTFS and Mac's HFS were originally designed for the spinning rust primitive, so I guess vendors of flash storage (SSD drives, USB sticks etc) had to deal with providing a translation layer to emulate the spinning rust primitives on top of the nand flash primitives. I'm sure various nand flash vendors have different characteristics / spare blocks / secret sauce / defects that are masked by proprietary firmware, and probably see a significant business advantage on keeping those secret. Even things like building smarts about how a FAT filesystem is likely to have heavy rewrites of the file allocation table compared to file contents, into the firmware for USB sticks where FAT is a likely fs, could prove an advantage. So being a single vendor behind the entire stack from the raw NAND flash memory to the motherboard it's soldered onto to the OS is likely very advantageous.

They have their secret sauce so that legacy software can pretend the SSD is spinning rust. Let them.

Why shouldn't we also demand standard low level primitives so that every OS can do the thing you're describing?

Of course a standard would be nice, but good luck getting everyone to agree on one before the end of the century :)

Already implemented in faster DSP from what I gather... http://arstechnica.com/apple/2011/12/apple-lays-down-half-a-...

Apple's EFI firmware has an HFS driver built into it. The way today's macOS boots is the firmware reads the bootloader off the boot partition created these days on Core Storage installations, and the bootloader (more correctly OSLoader) is what enables the firmware pre-boot environment to read core storage (encrypted or not) and thus find and load the kernel and kext cache and then boot.

How can it be the "default" when you can't use it on a Fusion Drive or any system volume?

This is a developer release. It's hardly likely that Apple is sinking dev resources into evolving a new OS filesystem without planning on the bootloader and backup functionality also being in place for the final release.

Copy-on-write and snapshots are excellent building blocks for Time Machine, so much in fact that, when it was announced, I suspected it was because of ZFS (which, at the time, was being considered for OSX). It's very likely TM will be adapted to work on it (with about 20 lines of code)

It's long overdue, but having spent months developing an implementation of bootable snapshots on os x that works with HFS+, (http://macdaddy.io/mac-backup-software/) this kind of stings.

I'm not sure about snapshots, TM is made to work by just copying the directory structure over, and each backup is a fully functioning hierarchy. But cloning is definitely a big deal.

Time Machine uses hard links to create a duplicate directory structure without duplicating all the files themselves; only changed files need to be copied.

As I recall, HFS+ was explicitly modified to support directory hard links, which is less common in the Unix world, explicitly to support this feature.

TM also maintains a folder called /.MobileBackups to store temporary backups while your backup drive isn't connected. OS X also maintains /.fseventsd, a log of file system operations that TM can use to perform the next incremental, instead of having to compare each file for modifications.

it doesn't "just" copy the directory structure over each time, it creates the structure for each backup, any files that are changed get copied in, ones that haven't changed are just hardlinked to the existing copies.

A snapshot is a copy of the file (and its blocks) at a given point in time. Subsequent writes to it will happen to new blocks, leaving the ones connected to the snapshot undisturbed.

Well, consistency is important too in backups. So TM will probably make a snapshot and do the backup from there. Avoiding moved files from later dirs to already processed ones etc.

"Can't be used as the startup disk" is not necessarily a strong limitation; with FileVault enabled, you start up from your Recovery partition anyway. Even if they lose FileVault, I would guess they'll keep the Recovery partition setup (since they've invested in it a bit as a pseudo-BIOS, for changing things like SIP.) So that image can stay HFS and hold your boot kernel, while "Macintosh HD" can just be APFS. Sort of like Linux with /boot on FAT32 and / in LVM.

Linux /boot tends to be on ext3 or ext4 on most distributions. Recently it's XFS on the server flavor of Fedora, CentOS, and RHEL. For openSUSE the default is Btrfs, /boot is just a directory.

The bootloader/bootmanager is what determines what fs choices you have for /boot. GRUB2 reads anything, including ZFS, Btrfs, LUKS, even md/mdadm raid5/6 and even if it's degraded, and conventional LVM (not thinp stuff or the md raid support).

/boot on FAT32 is mostly an artifact of UEFI these days. When I set up BIOS-based systems, I usually had /boot on ext2.

Or even ext4. But I think the parent's point was that you'd keep your boot partition out of LVM.

Those limitations are obviously because it's in development.

The irony, a case-insensitive fs on case-sensitive "macOS"

> Filenames are currently case-sensitive only.

Will they deploy it case insensitive, still?

Presumably as with today, you'll have the option. I don't have a strong opinion on case sensitivity of file names, but I suspect they'll keep it case insensitive by default. I think for the average non-technical user that two files, "MyFile.txt" and "myfile.txt", being different could lead to some confusion, and Apple historically has apparently considered that confusion unacceptable.

The average user is also confused by "MyFile.txt" and " MyFile.txt" being different, or "Proposal II" and "Proposal 2" being different, but filesystems aren't usually built around that. I don't think case sensitivity is special enough to get that sort of treatment.

I believe the problem is also present for a large amount of third party software, making the move to case sensitive drives pretty hard to do:

[0] https://helpx.adobe.com/creative-suite/kb/error-case-sensiti... [1] http://apple.stackexchange.com/questions/192185/os-x-case-se... [2] http://dcatteeu.github.io/article/2015/12/31/case-sensitive-...

I've noticed some bugs with case-sensitivity recently in Ruby, of all things.

More problematic is that many case insensitive hard drives would be copied into new machines and there would be millions of conflicts. Some utility would have to sit there and annoy people by asking them to make decisions.

I can see why there would be conflicts going case sensitive -> case insensitive, but I can't see why there would be conflicts going the other way. Am I missing something?

If you have a file named "MyFile.txt" and another system is looking for "myfile.txt", then it'll not be found and Apple will not let you rename it because it thinks it's a no-op. That's frustrating as hell.

Apple's software (Finder, mv) lets you rename these. But it's true that some tools (I think git) get confused here.

Git init / clone on case-insensitive HFS+ sets `git config core.ignorecase true`, which can lead to confusing behaviour where it ignores a change in the case of a filename.

> The default is false, except git-clone(1) or git-init(1) will probe and set core.ignoreCase true if appropriate when the repository is created.


I think the parent meant the other way around too.

However, the transition between the case insensitive and case sensitive filesystems isn't going to happen overnight. People will be copying files around both ways for quite some time, so the insensitive -> sensitive case is still going to be a concern.

I think you have it backwards. If you try to expand an archive with FOO.TXT and foo.txt, what should happen if you're writing to a case insensitive file system?

    $ touch HI
    $ touch hi
    $ ls
So that's disturbing. Another problem is every software you can think of will be comparing two files case insensitively. Almost weekly I get burned by this.

    $ touch HI
    $ test -f hi && echo ok

That's not convincing me that I have it backwards. I was responding to this point in the parent comment:

> many case insensitive hard drives would be copied into new machines and there would be millions of conflicts

I still don't see where you get a conflict copying the contents of a case-insensitive file system to a case-sensitive one.

> I still don't see where you get a conflict copying the contents of a case-insensitive file system to a case-sensitive one.

Because some apps create MyFile.txt and expect to be able to access it later by myfile.txt. Adobe's applications, for example.

I had no idea. That's horrifying.

Open up the various folders of Adobe's software (on Windows). The DLLs are a mish-mash of all lowercase and upper-lower mixes. Heck, open up System32; the DLLs there are definitely not case-sensitive capable (`kbd*.dll' being one example). In fact, I bet you there's at least one program on your computer that accesses the Program Files using `C:\PROGRAM FILES (X86)' instead of `%programfiles(x86)%'. In fact, even environment variables aren't case-sensitive.

For the end-user they could prevent duplicate different-cased file names in the UI layer (the Finder), instead of the file system. That would be a more appropriate place for it anyway.

And then some code using Unix APIs would create two files whose names differ only in case and the UI layer would choke. This is why spray-on usability is bad.

The UI already has to deal with that anyway because it supports case sensitive volumes. What exactly constitutes case is locale specific, it differs from one user to the next, that logic would be messy to have inside the file system.

That would likely be a hassle because you'd have to be consistent for all programs that ever save or read a file. As a result it has to be an OS-level thing at least, if not at the file system level. I don't have a huge preference (case sensitive or insensitive), I think it's not worth a religious war, but whatever the choice is, it should be completely transparent to understand what convention the system is using as a coder, and as a general user.

This is a horrible solution.

Steam relies on the filesystem being insensitive.

Steam on Mac does, or at least did last time I tried to use it on a case-sensitive partition. It's not that steam inherently needs case-insensitivity, it's that some of the main app mixes the case of files in the app from what is on disk. So without case-insensitive FS it cannot find some files. Stupid problem really.

No it doesn't. How would it work on Linux if it did?

Both, you and cuddlybacon are right. A long time ago, they worked under the assumption that the FS is case-sensitive, and all the games I installed back then had title-cased folder names. Gradually, Valve stopped caring about this, and my games stopped working. I had to go in and manually change some game folder names to lower-case. It then kept some small files under SteamApps and the downloaded games under steamapps. They have fixed that now. Now, I have both CONFIG and config in my Steam folder.

How would it work? By a combination of magic and "we can't be bothered; the users should figure out something".

Really? What filesystem do they use in SteamOS?

SteamOS is Linux, based on Ubuntu. So I assume it is ext4, ext3, or xfs.

Btrfs is not stable enough IMO for something like SteamOS.

SteamOS is based on an older Debian release and uses ext4.

I know Unreal Engine 4 does, but Steam does not.

Does Adobe still?

Yes they do.

The fs should be case sensitive. If they want to enforce insensitivity it should be done with APIs for programs including the Finder.

According to their documented "Current Limitations" (https://developer.apple.com/library/prerelease/content/docum...):

> Case Sensitivity: Filenames are currently case-sensitive only.

First thought: they have seen the light!

A moment later: wait...they consider this a "limitation", and it's only "currently" the case. So maybe they're going to perpetuate the brain-damage anyway.


There's plenty of code in the wild that assumes case-insensitivity since that's been the case since forever.

Backwards compatibility is going to end up trumping whatever ideological purity case sensitivity represents.

Just as with HFS+ and ZFS, case-insensitivity will be an option.

Ok, what is the argument against case preserving but insensitive file systems?

It pushes a localization and UI problem down into the filesystem layer. Case-insensitivity is pretty easy for US-ASCII, but in release 2 of your filesystem, you realized you didn't properly handle LATIN WIDE characters, the Cyrillic alphabet, etc. In release 7 of your FS, you get case sensitivity correct for Klingon, but some popular video game relied on everything except Klingon being case-insensitive on your FS, and now all of the users are complaining.

How do you handle the case where the only difference between two file names is that one uses Latin wide characters and the other uses Latin characters? This one bit me when writing a CAPTCHA system back in 2004. (Long story, but existing systems wouldn't work between a credit card processing server that had to validate in Perl, and a web form that had to be written in PHP, where the two systems couldn't share a file system. It's simple enough to do using HMAC and a shared key between the two servers, but for some reason, none of the available solutions did it.) I noticed that Japanese users had a disturbingly high CAPTCHA failure rate. It turns out that many East Asian languages have characters that are roughly square, and most Latin characters are roughly half as wide as they are tall, so mixing the two looks odd. So, Unicode has a whole set of Latin wide characters that are the same as the Latin characters we use in English, except they're roughly square, so they look better when mixed with Unified Han and other characters. Apparently most Japanese web browsers (or maybe it's an OS level keyboard layout setting) will by default emit Latin wide unicode code points when the user types Latin characters. Whether or not to normalize wide Latin characters to Latin characters is a highly context-dependent choice. In my case, it was definitely necessary, but in other cases it will throw out necessary information and make documents look ugly/odd. Good arguments can be made both ways about how a case-insensitive filesystem should handle Latin wide characters, and that's a relatively simple case.

Most users don't type names of existing files, exclusively accessing files through menus, file pickers, and the OS's graphical command shell (Finder/Explorer). So, if you want to avoid users getting confused over similar file names, that can be handled at file creation time (as well as more subtle issues that are actually more likely to confuse users, such as file names that have two consecutive spaces, etc., etc.) via UI improvements.

Just saw a comment in another thread, stating Apple had slipped in improving their Unix layer, and here comes this. New file system is not a joking matter: Microsoft failed to deliver their new FS; Linux took years to go from ext2 to ext3 to ext4, and btrfs is in forever testing; most of *BSD still use their old ones; zfs took decade to become mainstream...

The information currently is very scarce on this one, but I hope they would at least test it REALLY WELL.

And Linux has the blessing/curse of having a million distros, each implementing their installer a little differently, exposing subtle bugs and inconsistencies in the FS/bootloader department.

And AFAIK, GNU Grub hasn't done a release in about four years now, and all the distros are using their own, custom beta build of it. It's a bit of a mess.

Didn't Microsoft introduce ReFS with many modern FS features?

Yes, and the list of NTFS features it dropped was just as long.

They did, it's more fault tolerant than NTFS, i am using it on my Windows machine for my sensitive data with storage spaces.

They don't consider it primetime yet.

Many distros didn't support other file systems (JFS, XFS, et al back in the day) officially, but you could usually find a way to use them as boot or a forked installer.

I realize new file systems are difficult, but HFS+ is just an ancient mess that's needs to be replaced for a long while. This isn't new and innovative so much as finally getting around to removing technical debt and catching up with the rest of the world.

Windows and WinFS is a bad comparison. WinFS was just a tagging/metadata system on top of NTFS with a SQL storage backend. We're still quite far from the ability to tag files with custom meta data and have it easily to query using default file chooser dialogues.

Fwiw this took ten years. Apple tried to migrate to zfs and had to back off over license issues. We've been waiting since then.

SailfishOS uses btrfs.

Which comment was that? I'm curious to know what they did with Unix layer.

Sounds like a useless Apple project that is going to die pretty soon :(

I'm going to be very cynical and say that based on their track record of major OS overhauls I am _not_ looking forward to the bugs and issues that will sneak past their QA. I still have nightmares of all the bugs with their wifi & USB stack changes in the last couple OSX releases. Please Apple, for your own good don't 'move fast and break things' with the filesystem. At the very least it's time for everyone to make sure they have a good backup system in place before touching this thing.

Compared to how risky and problematic HFS+ is? It may still be better.

Exchanging problems you know for problems you don't know.

I think they'll only call this 'stable' once they enable it on new iPhones by default. Which might be 2017, 2018, …

While I am happy that Apple have at last committed to replacing HFS+, I'm wondering why they didn't use ZFS rather than reinventing the wheel. It's not like it's a particularly easy wheel to reinvent either; the amount of effort which goes into a filesystem like ZFS is non-trivial. Why not build on top of that?

I would have greatly appreciated being able to use ZFS with MacOS X, for datasets, snapshots, sending them to remote pools for backup etc. It would have made it directly interoperable with a lot of pre-existing and cross-platform infrastructure. (I couldn't care less if it didn't scale down to the "watch". Filesystems are not a one-size-fits-all affair.) I find it great that I can take a set of disks from e.g. Linux, run "zpool export", pull them out, and then shovel them into a FreeBSD system, run "zpool import" and have the pool and datasets reassembled and automatically mounted. Perfectly transparent interoperability and portability. While Apple like to do their own thing, this is one place I would have definitely appreciated some down to earth pragmatism and re-use of existing battle-tested and widely used technology.

I think the barriers to getting ZFS into OS-X were probably legal, not technical ones. There's long history of the CDDL tripping up reasonable attempts to use ZFS and I doubt Apple would have wanted to proceed without approval and/or alternate licensing. Likewise Oracle has no real reason to let them do that without writing a check with lots of zeros on it.

There's also a decent chunk of what ZFS supports (particularly flexible volume pool management) that would be useless on almost every machine that Apple makes and sells. Your example of pulling a drive out of one machine and putting it in another is either impossible (soldered-on storage) or highly unlikely (user-serviceable SSD, but hidden behind a bunch of pentalobe screws) with their modern hardware lineup.

Apple should have bought Sun at the time, and cross-pollinate Mac OS X and Solaris while using OpenSolaris to attack the server/cloud market.

If Apple bought every single company that people say that "Apple should buy X", it would be no company left on Earth besides Apple.

How does AAPL feel about the CDDL?

Given that there are other companies shipping ZFS without Oracle's blessing, I am curious why Apple could not do the same? (honest question)

Is it that Apple is a much bigger target? Patent issues?

Many of those companies probably started using ZFS before Oracle acquired Sun. After Oracle acquired Sun the risks of using ZFS skyrocketed legally and Apple had the option of just not taking on that risk since they hadn't shipped it yet.

Given what we know about Oracle that was probably the right call.

They ship dtrace.

I read the link and have a pretty good understanding about how computers and file systems work but can someone "explain like I'm a decently intelligent programmer" what is different about this file system? Thanks you.

This bit from 2012 by John Siracusa outlines everything “wrong” with the old file system, HFS+, and doubles as a sort of guide to what this new file system fixes.


They simply started from scratch with an aim to

* best support modern hardware technologies,

* start from the state of the art in file systems (like ZFS and btrfs, old as they may be), and also

* emphasize security.

Sounds a lot like what Apple did with Swift.

It's apple reinventing the zfs and btrfs wheels. Which are already mature, useable and open source.

Google them and see what they do well, and you might get some ideas.

Both filesystems have copyleft licenses so that may factor into Apples decision.

Hopefully this will be the end of .DS_Store and the crazy unicode normalization issue that causes mixups when rsync-roundtripping a directory structure between ext4 and HFS! :)

Unicode support on filesystems is just an ugly mess, and APFS is unlikely to fix this (the brief documentation doesn't mention it at all).

Let's recap:

- On OS X HFS+, filenames are stored using a "variant" of NFD, where some characters are precomposed "for compatibility with old Mac text encodings" (https://developer.apple.com/library/mac/qa/qa1173/_index.htm...).

- On Windows NTFS, filenames are "opaque sequences of WCHARs", and are thus "kind of" UTF-16 with no formally required normalization format. Windows itself tries to use NFC, but applications are free to use the Windows APIs to create a filename with anything they like. Since filenames are just sequences of 16-bit WCHARs, dangling surrogate pairs are allowed (and can break all sorts of code!)

- On Linux, filenames are opaque sequences of 8-bit characters. The only requirement is that a filename not contain either a slash or NUL character. No other formal specification exists, although "most" users these days use UTF-8. (However, you can and will find loads of filesystems with invalid UTF-8, usually because filenames are in one of the ISO encodings instead).

Multiple programming languages have been bitten by the possibility of invalid Unicode in filenames (see for example: rust (https://github.com/rust-lang/rust/issues/12056), Python (https://www.python.org/dev/peps/pep-0383/)). This mess is pretty much never going to go away, either, because filesystems are extremely durable and long-lasting.

So the main thing I've observed is rsync'ing web files from a linux server to an osx laptop, and back to the linux server, ends up with a bunch of duplicate decomposed utf8 filenames on the linux server. It can be avoided with careful use of rysnc's "--iconv=utf-8-mac,utf-8" etcetera, but it feels super-unnecessary. Fix it already :D

More details here http://serverfault.com/a/427200

PS: I've been using LC_CTYPE="whatever.ISO8859-1" and an ISO-8859-1 Terminal.app locale forever since I seem to keep dragging a bunch of legacy filenames around (having started on MS-DOS and FreeBSD 2.2) and ISO-8859-1 still seems to be the only locale that lets me "see the bytes" matrix-style instead of a random amount of "?" chars. Curiously, Finder.app seems to keep up very very well despite the odd encoding. Crossing my fingers the new APFS will act more like Linux.

PPS: Java is especially hilarious when launched with -Dfile.encoding=utf-8 as it is literally impossible to access some files from there.

And don't get me started on .DS_Store :P

Doubt that. .DS_Store are created by the Finder.... not the file system.

Maybe Finder could stash that stuff elsewhere if the new file system accomodates metadata on directories, or such.

As of the first macOS Sierra developer beta version there are no .DS_Store files on APFS volumes.

Uh interesting thought:

With Space sharing and multiple logical volumes there will probably be a shift to encrypting users home directories separately.

Probably even an unencrypted System Base, which is read-only and protected by rootless anyway, so the system can boot without user interaction.

This also finally allows the most user friendly implementation of file encryption: Encrypted Folders, supported by the OS.

Sounds cool, Id be willing to help with a rust port once the source is available :) (case sensitive only - naturally)

The point of encrypting the base system image is mostly to make it part of the Secure Boot chain. rootless is a policy control to stop processes from modifying the OS from inside. An encrypted system image stops things with hardware access (but not the unlock key) from modifying the OS from the outside. You can trust any disk whose blocks you can decrypt with key X, to have been only written to by someone with key X.

Why would you need encryption for the Secure Boot chain? Would you not rather sign the base system image for that purpose?

You'd have to sit there and read the entire base system image before booting to verify that signature. But only have to decrypt a block when it comes time to read that block—with the ability to kernel-abort right then during the boot process if the block doesn't decrypt. (Or were you suggesting individually signing every block?)

You don't sign the whole image as a stream, and you don't sign every block. Recursion is your friend! You sign the Merkle tree root, check it once, and then check O(log n) hashes per block access. You can, of course, amortize the checking of the first several of those hashes as a further optimization that ties in easily with your caching layer.

There's no such thing as "the block doesn't decrypt" absent MACs/MICs or AEAD schemes -- encryption and decryption are just maps from N bytes to N bytes.

thanks for mansplaining rootless and encryption (/s).

Encryption is not authentication, so it does not prevent unnoticed modification.

And secure boot only really depends on authentication and not on encryption, so you are conflating two different concepts.

No you cant just trust the data, replay attacks over time and space are still an issue.

I'm not the one conflating them; it's the Secure Boot people who think this is a good idea. Full-Disk Encryption is the defined "OS" stage of the Secure Boot chain-of-trust today, acting as an "optimization" (heh) over signing disk blocks.

It's certainly more secure (indeed, it prevents replay attacks) to just keep a big block-hash table, update it when blocks change, and then hash that table and sign it on fsync—but it's costly in a few ways over just trusting unauthenticated encryption, and was even moreso five-to-eight years ago when Secure Boot was being formulated.

These days, you see a lot of wholly-signed read only OS images—the OSX recovery partition is signed; CoreOS signs its OS images; most firmware is signed; etc. But I don't expect the unauthenticated-encryption on most computers' read-write rootfs will be replaced by a signed-but-unencrypted filesystem any time soon—if just for the fact that consumers really seem to hate the idea of separate OS and data partitions, especially when the OS partition is "stealing space" they could be using for data. (The only thing I can think of that might finally kill this making the default install on some consumer-OS create a thin pool, such that an OS partition that only contains 5G of data only "steals" 5G of their "space.")

Or, y'know, authenticated encryption. Do any block-device cryptosystems support an AEAD mode yet? LUKS maybe?

> "You can trust any disk whose blocks you can decrypt with key X, to have been only written to by someone with key X."

No you can not. That is your sentence not from Secure Boot People.

Also I assume that APFS will support encrypted and unencrypted logical FS in the same space sharing FS instance. So the separate OS partition is just a logical FS which is unencrypted. - Which I meant in my original post.

GELI and the AES-GCM are authenticated. Not sure if GCM has equivalent properties to the GELI HMAC feature but probably good enough.


> Do any block-device cryptosystems support an AEAD mode yet?

FreeBSD's geli supports authenticating data with HMACs.



Thank you for injecting sexism into a technical discussion.

white male 20-something here, I find it funny.

Wow! This makes me feel good, maybe Apple gets it that many of it's users use it's devices because they present proper UNIX desktop environment. And that environment needs to be cherished and it needs to evolve.

This is not a small thing. We had nice visual overhaul 2 years ago, now Apple needs to pick it up on under the hood level.

It would be really nice is there was a modern file system that "just worked" regardless of device. I'm tired of having exFAT/FAT being the only filesystem I can reliably use on multiple different OS-es painlessly, and even then I can't use it for all the functions of those OS-es (No time machine). Hopefully this will be open enough to enable that, though who knows how it will shake out.

https://en.wikipedia.org/wiki/Universal_Disk_Format ? That looked hopeful last I was juggling with this sort of thing, but that was long enough ago that WinXP support was the killer.

Getting UDF to work with rewritable media, like usb sticks, is an unholy pain in the arse. It involves trying to remember an arcane combination of versions and feature flags to get the universal format to actually be universal. I managed it once, but after that, realized that the network was fast enough for most of what I needed and I tend to not have the four gig files that cause problems with FAT filesystems.

From what I understand, UDF is primarily targeted at one-time-recordable media. (It's used in DVDs, for example.) It's unclear how well it works for primary storage, but I suspect it's clumsy at best.

It's only got to beat FAT. That's a pretty low bar.

One that it doesn't meet. Try to read and understand the UDF spec sometime.

> You can share APFS formatted volumes using the SMB network file sharing protocol. The AFP protocol is deprecated and cannot be used to share APFS formatted volumes.

AFP deprecated too.

I didn't see any mentions of check-summing. Please tell me this brand new filesystem won't be susceptible to bitrot?

Much like other modern filesystems such as ZFS and BTRFS it supports multiple filesystems in a shared storage space, snapshots, copy-on-write and also has encryption built-in as a first class feature including multiple keys and per-file keys.

There are many limitations in the developer beta so this is clearly still very much a work in progress. Getting these file-systems right is traditionally difficult and can take years (see ZFS, BTRFS) so it will be interesting to see how well it does.

It's the transparent compression I'm using btrfs for. Only that, ZFS and NTFS support it, so not much choices for that scenario.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact