If those can be sent: Finally Time Machine done right.
> The AFP protocol is deprecated and cannot be used to share APFS formatted volumes.
Interesting.
> An open source implementation is not available at this time. Apple plans to document and publish the APFS volume format when Apple File System is released in 2017.
> encryption models for each volume in a container: no encryption, single-key encryption, or multi-key encryption with per-file keys for file data and a separate key for sensitive metadata
Nice. I hope they also include checksums for each block.
Famously missing, but not the hardest thing to add considering all the features above: Compression (which HFS+ supports!)
> If those can be sent: Finally Time Machine done right.
If this ends up being true I can't wait—it's so frustrating watching tiny incremental backups take forever over the network. It seems like "Preparing" and "Cleaning up" take longer than moving the data.
it's so frustrating watching tiny incremental backups take forever over the network
While waiting for APFS to become stable, buy Carbon Copy Cloner. $40. I love it. It's fundamentally rsync, but tailored to OS X. For personal use a single license covers an entire household.
Every night CCC fires up on each laptop and each does an incremental clone to its own dedicated directory on my desktop machine. This clone is usually less than a few GB and takes about a minute to run. CCC doesn't have to be run daily, it can be told to run hourly instead.
Time Machine running on my desktop then copies everything off to yet another disk.
So my backup environment is:
each laptop uses CCC to periodically clone to desktop
the clones on the desktop are "traditional",
CCC is told not to keep its own copies of modified files
desktop runs time machine,
makes hourly backups to a TM disk,
old versions of files can be found there
This gives me 3 copies of all data on each laptop:
1) on the laptop
2) on the desktop
3) on the desktop's Time Machine Volume
If you create your users in the same order on each machine, or later go back and change the User IDs to match (under advanced options in Users and Groups) then the uids will match everywhere, and all files can be easily browsed in each location, with all file permissions intact.
18 years ago, while being a teenager, I misconfigured CCC only to find out that only the active user folders where synced after a full erase/reinstall upgrade of MacOS.
This was my epic "there is to kind of people those who lost data and those who will loose data" story. All my dad and mum files where gone, luckily nothing professional and no pictures (analog camera still ruled back then).
This is just a reminder that no tool is ever perfect. CCC is great, TimeMachine is sluggish but "dumbproof" to a certain extend (but you have to keep faith in a black box).
Personally I'm pretty paranoid so I perform my monthly off-site backup by rebooting my Mac on recovery partition and I perform a full disc copy to an external drive using DiskUtils (because ultimately this will be the recovery tools I'll use if things goes bad so I need to use my off-site backup...).
It really sluggish so I run it while sleeping. I get that for professionals with more than 1To you will need to get to some more serious stuff anyway.
I appreciate this was a long time ago and you've learned a lot since, but your story perfectly illustrates the importance of not just performing backups, but periodically testing them as well. That way you eliminate your blind faith in a black box by ensuring you actually have a working solution to fallback on.
I my case (and I can't emphasis more on my teenager status back then) it was a permission issue, everything looked fine from my perspective, but CCC somewhat didn't have the right permissions to sync folder of others users. So their home folder where actually empty.
From my perspective the backup was a success... This traumatizing error is when I first learned about superuser and permissions. I guess CCC has gone a long way since then and added more safeguard. But I must confess that I've sticked to a "use the goddam standard tools" since then.
Sure there must be a lot of more customizable or faster tools out there. But Time Machine stay "the backup tool my mum can't use wrong" I can't give enough to credit to Apple for that. And putting aside NSA/FBI problem, iCloud backup for iPhone is exactly as smooth. Literally every time I go to a Genius Bar, someone is about to hug a genius because of a successful restore from the iCloud of their broken/stolen iPhone.
This is how you build a "faithful" customer base and no other tech company understand that better than Apple so far.
Those points don't change the fact that you (and everyone else) should test back ups periodically.
If it's not file system permissions nor other configuration problems then it will be a failing storage medium or just dumb user error. So the only way to be sure you have a working back up is to test that back up - ideally before you actually need to use it.
I got a little lost while writing. But indeed my point was double check! Because in my case everything looked fine at first sight.
This is even why I don't use the same process for my two backup (daily/monthly). Because to be fair you can't constantly check that your daily/hourly backup isn't corrupted, you rely on its build-in safeguard and occasionally you check it.
But by using two different methods for daily vs monthly(offsite) backup, you significantly reduce the odds that the two different methods failed at the same time while you need them. (Also a monthly full clone is way easy to check than a Time Machine Disk)
Yeah, once upon a time at a startup, I wanted a file restored. We were small; the CEO was actually the one doing the backups (to QIC tape cartridges).
Of course, the recovery failed. In fact, the backups hadn't been working properly for months. I lost a few hours of work; if our file server had actually failed, not having a good backup could have literally been catastrophic. As in would the company have survived?!
I recall an ancient quip, more or less: "if you don't test your backups, you don't have backups, you have dreams".
I keep hearing this, but as a Mac user who uses Time Machine and CCC, what's a good way to test the backups? After all deleting my only system to attempt a restore seems even more dangerous.
Here's a way to do it. You need to feel comfortable around the OS X terminal command line. You only rely on OS X's built in programs, so it's an independent way to check if your backup program is doing the right thing.
Then I would run the same type of script in the destination directory and diff the results.
There are some annoyances. E.g. (from memory) the ~/Library/Caches files aren't backed up, so they will be missing at the destination. I wrote some sed commands to first remove some of these from the result checksums to keep my diff's more manageable.
Instead of the above, I currently use a Python script that I wrote, that let me fine tune things. At the heart of it is (as root, of course) using Python's os.walk to traverse a directory tree. For each file I use hashlib.sha256 to generate a checksum. I also don't descend into certain directories. Etc.
Using a few Python scripts to generate and process the source and destination checksums allows me to, at the end, use a simple
vimdiff source.checksums destination.checksums
to quickly see the few differences that remain.
Keep in mind that the perfect is the enemy of the "good enough". Just using the basic 'find' (and not a custom Python script) is plenty good. I used that method for many years.
Many errors stick out right away. E.g. if you have 1,000,000 files checksummed in your source but only 250,000 in your destination, then you quickly know that you screwed up.
Not trying to troll, but I see so many HN comments about backups and I just don't have this need anymore. What are people using traditional backup software like time machine, carbon copy cloner, etc. for on their laptop these days?
I use google docs for all my docs and spreadsheets, occasionally I use excel or word or keynote for files but if I do I save the docs to my dropbox or google drive folder, I have my photos synced with google photos, my music is synced through itunes match (or I can sync my music library files to dropbox, or use spotify), all my code is in git repos pushed to github or bitbucket.
What else is there to back up? Is it a matter of not wanting to re-install apps manually to get your computer back to its current state? Or is it a matter of not wanting to pay for the extra storage space on dropbox?
- Some people don't want to use the cloud to store their stuff for privacy / ethic / commercial / legal reasons.
- Some others have been burnt by the cloud failing them (corruption, data loss, copyright abuse, etc) and want an additional layer of security.
- There are many activities that are fully not covered by the cloud, such as 3D / video / music edition, art in general, programming (all those dev envs)...
- You may want / need to be able to be able to recover from a dataloss even if you are offline.
- Some file types don't match the cloud sync paradigm very well such as system and configurations, non online game saves and all stuff that needs to be at a precise place on the system.
- You got stuff you want to manipulate as files in dirs, not as an entry into app. Power users usually dislike the loss of control and freedom the apps imply.
- You like to have 3 backup because of the rule "one on site, on off site".
- You like to have all your backups at the same place.
- You have collections off files that just don't fit in the cloud, such as Tera of Videos.
- Sex tapes still need to be backed up, and you won't send that to your dropbox.
- Some people still have a shitty internet connections and can't sync reliably.
- It's less work to manage one backup than to setup all the parameters of all those apps to be sure they sync only what you want.
- You read the licence for some sync plateform and couldn't decently click "I accept".
- You got a NAS at home on which you plug your hard drive with the backups for the whole familly.
It's not only paranoid people who keep offline backups, it's those that understand the threat models where data can be destroyed through logical or physical access to all storage locations.
For arguments sake, I'll focus Google services. As I personally discovered recently, docs offers minimal protection for your data:
1. Docs shared with you (others are "owner") can disappear without notice.
2. Manual clones are the only way to keep a copy of shared documents.
3. The GDrive agent keeps no local copies of docs, only urls.
4. Any deletions made >25 days ago are unrecoverable.
5. Deleted accounts have only a 5 day undelete window.
When you combine these it means a document you've contributed to may quickly disappear forever as the result of mis-actions made by someone else or their employer/educational institution. You may have assumed "all changes saved to drive" meant into your account, but you'd be wrong.
Similarly removal of items from trash in GMail/GPhotos is permanent. If someone maliciously gained access to your Google account (or one of your devices) they could quickly and easily purge your data. These deletions would quickly and efficiently propagate to all your devices and purge the canonical "cloud" copy.
The scenario you describe involves no true "backups". It's probably better to say that you only have original copies, albeit resilient to failure of local hardware. Cloud services have data-loss events, due to both technical and business reasons. They're vulnerable to a variety of SPOFs out of your control or visibility.
I've known so many people over the decades who trusted some online provider to reliably store the only copy of critical data, only to be burned. Even providers that you would have thought should be bulletproof.
My rule of thumb, based on data loss studies, is to have three copies of any critical data. Any cloud provider should only be considered to be one copy.
I agree cloud based backups like Dropbox just sync in the background. Got any souce code store it in Github or Bitbucket.
Documents and source code doesn't take up much space. Unless you want music and video files which iTunes can always sync for you to your Apple mobile device.
I use a USB hard drive to back up my Thunderbird and Firefox profiles. Also my downloads and other stuff too big for Dropbox. $100 for a 4T USB 3.0 hard drive is cheap.
Sync is not backup. If you corrupt or delete a file, the destructive action will get propagated as well. Time machine and many other backup systems gives you access to previous versions of files and folders.
CCC was what I used back in the day (IIRC it used to be free) to make the most reliable bit-for-bit backups of OS X machines. Time Machine skipping over so many files is kind of a nuisance to me.
Unless you set the privacy in Time Machine to ignore certain drives/volumes/folders, you shouldn't really miss anything with a Time Machine backup. It usually ignores those files that aren't necessary and can be created by the OS or applications (like caches or temporary files, for example). What kinds of personal files did you lose with a Time Machine backup?
Of course you can see the full list at /System/Library/CoreServices/backupd.bundle/Contents/Resources/StdExclusions.plist.
One of the biggest is iPhone backups from iTunes. If I were a normal user and my MacBook didn't boot up tomorrow, I'd be surprised to find my backup didn't include critical files like this. For example, apps that are no longer in the App Store but are in your backup can be restored. Once you don't have that backup anymore, they can't.
It's just one of the reasons Apple users are forced to have multiple backup services if they want reliable and complete backups.
I'm not sure why the iPhone backups on your system didn't get backed up. The device backups stored in ~/Library/Application Support/MobileSync/Backup/ is not in the exclusions file that I see.
P.S.: The latest app binary files may not get updated/synced on the Mac post iOS 9/iTunes upgrade.
I double checked my drive and you're correct with regard to mobile backups. That's reassuring.
It would be nice to have a utility to give a definitive diff of what's on your hard drive that's not on your: Time Machine drive / [Carbonite, Backblaze, etc] backup.
AFP depends on having persistent, globally addressable IDs for files (CNIDs). Perhaps this is no longer available under APFS?
While I imagine this might be retrofittable (netatalk manages it, after all), I can't blame Apple for wanting to ditch AFP. It's an ancient protocol at this point, and I'm frankly just amazed it still works at all.
Crap driver? It's not a 'driver', and smb has been natively supported for at least 8-10 years. It hasn't sucked in a looong time either. In fact recent OS versions have defaulted to smb v2 for file sharing.
I believe when they flipped the switch to default to SMB it was actually to SMB 3 in 10.10.
Anecdotally, I'd always found AFP in the Tiger and Leopard days to be faster than whichever version of SMB support was included at the time. Now I use the default SMB3 and it seems that 802.11ac and gigabit are bottlenecks (of course its 10 years later in the times of SSD's as well)
And it wasn't just anecdotally faster; I worked for a storage company specializing in Mac workflows and AFP was empirically several times faster, especially on 1GbE and 10GbE networks. This was in part due to Apple ditching Samba in Lion/10.7 (http://appleinsider.com/articles/11/03/23/inside_mac_os_x_10...) over GPL concerns and replaced it with their own shitty, incomplete implementation, which they didn't get up to Samba's standards until 10.10.
I remember this being excruciating since customers had to either buy a third-party SMB implementation like DAVE to get any value out of 10GbE connections, or hope that the applications they wanted to use over the network supported AFP.
The spartan description of APFS certainly sounds like the (partial) feature list for ZFS--the comparisons made in the comments here are on-point. ZFS though took around 5 years to ship and, arguably, another 5-10 to get right. I say this having shipped multiple products based on ZFS, writing code in ZFS, and diagnosing production problems with it.
On-disk consistency ("crash protection"), snapshots, encryption, and transactional interfaces ("atomic safe-save") will no doubt be incredibly valuable. I don't think though though that APFS will dramatically improve upon the time it took ZFS to mature from a first product to world-class storage.
Some commenters have opined that (despite Apple distributing ZFS for Mac OS X at WWDC nearly a decade ago) that ZFS would never be appropriate for the desktop, phone, or watch. True ZFS was designed for servers and storage servers, but I don't think there's anything that makes it innately untenable in those environments--even its default, but not essential, use of lots of RAM.
Who knows... maybe Apple have spent the decade since killing their internal ZFS port taking this new filesystem though the paces. Its level of completeness though would suggest otherwise.
Between the ZFS-like features and the advertised "novel copy-on-write metadata" scheme, I would not be surprised if it was partially based on DragonFly's HAMMER/HAMMER2 filesystem.
If you poke around in the the APFS kernel extension, it's not a very big binary, given the feature set. (It's a 550K extension, compared to the 2.5MB zfs.ko on FreeBSD.) I haven't disassembled it yet, but I'd wager that APFS is layered atop Core Storage btrees/CoW. Since that code's been shipping since 2011, maybe APFS stabilizes faster than a true greenfield filesystem?
(That does leave me wondering how interesting an open-source APFS would be without an open-source Core Storage.)
Yes. I was a heavy user of ZFS on Mac OS X back then. Was on the developer mailing list for that implentation, submitting bugs and talking to Apple developers working like mad to ship it.
Then Steve Jobs' buddy Larry bought Sun, and licensing of ZFS became basically impossible to sort out on time. So they dropped it.
The Sun acquisition was announced in 2009 and closed in 2010. It may have been related, but I have my own suspicions about about what caused Apple to jettison their port.
Given the mountain of legal shit Google has been subjected to over Java, the last thing Apple wanted was to have to answer to Larry's lawyers. ZFS, awesome as it is, has been poisoned.
Yeah, I bet it's been in development for a while. People have been seriously bitching about HFS+ for at least a decade, and with good reason. Apple has been pretty much silent about it apart from briefly testing ZFS and I'd wager a guess that they started working on it shortly after they abandoned that, while they continued also bolting more stuff onto HFS+ in the meantime.
Unless I'm misreading, the page seems to indicate that only external volumes can be formatted with APFS. My guess is that they'll have limitations and HFS-defaults for several years, at least on macOS, as the file system matures.
Either way, what file system didn't take years to get right? There are so many possible edge cases with file systems that it not only takes a long time to sort them out, but an amazing community and/or luck to reproduce deterministically to fix them.
File systems take A LONG time to stabilize, and you want them stabilized well, because if there is a bug in httpd, you just restart it, if there is a bug in filesystem you loose your data. See BTRFS. Started in 2007 and it is still not stable enough to be supported in RHEL (and for good reasons). So, of course, Apple engineers are way better than the rest of the world and they have magical pixie dust which makes bugs disappear, so they will fix bugs way faster than anybody else. Not.
It's good to see Apple catching up, no matter where it's coming from. You mention ZFS, but per-file encryption (EFS) and snapshots (VSS) in particular stood out to me as features NTFS has had for a decade.
Anyone know who works on APFS? If I were Apple I would have certainly picked up some of the ZFS core team, curious if any of them are currently at Apple.
Could speed up their time to market with such seasoned hands on board.
It seems ZFS was a no-go because of its license. But why does Apple develop its APFS rather than port the open source BTRFS with all its features? NIH syndrome?
Another reason is that ZFS just isn't fitting for Apple devices. It's memory hungry and energy hungry and has several limitations compared to HFS+ like not being able to be resized.
ZFS will be happy on a system with only 512MB of RAM, no matter how much storage it manages (assuming only 1 pool). It does need more RAM than UFS, but the amount is not notable unless we are talking about systems with 32MB of RAM.
Being energy hungry relative to UFS and others is likely true due to things like checksum calculations and compression, but there is no way to implement these things without needing more cycles to compute them.
> Being energy hungry relative to UFS and others is likely true due to things like checksum calculations and compression, but there is no way to implement these things without needing more cycles to compute them.
Not so true now - people have added encryption and compression instructions to CPUs. I'd be surprised if Apple couldn't ask Intel for a couple opcodes, and with the mobile platforms they do it anyway.
It is platform dependent. ZFS does not do that on Linux yet in part because of GPL symbol restrictions and the fact that there are other things to develop right now, although there has been some work done in this area to use the instructions directly. It definitely takes advantage of them on Illumos. I am not sure about the other platforms.
xnu's loadable kext isolation means that you have a one-time hit on using anything beyond x86-64+sse2, which can be paid at kernel prelink time, kext load time, or while a kext is running, via a trap that switches call preamble/postamble to handle the extra state (and which facilitates selecting fast paths on a cpu-by-cpu basis, for example). Only the presence of x87 insns impose noticeable cost.
o3x builds and runs just fine with -O2 -march=native and the latest clang just by changing CC and CFLAGS; the kexts that get built aren't backwards compatible though (you'll get a panic if you build with -march=native on a machine that does AVX and run on a machine that doesn't).
The code that recent clang+llvm generates makes heavy use of the XMM and YMM registers, and does some substantial vectorization. The compression and checksumming and galois field code that's generated is strikingly better, although not quite as good as the hand tuned code in e.g. (https://github.com/zfsonlinux/zfs/pull/4439). It may be interesting to compare performance, but given that compression=lz4 and checksum=edonr has negligible CPU impact on a late 2012 4-core mac mini (core i7) even when doing enormous I/O (> 200k IOPS to a pair of Samsung 850 PROs), hand tuning likely won't make as much of a difference as moving up from compression=on, checksum=[sha256|fletcher4].
I'm pretty sure that once the hand tuned stuff is in ZOL it'll get looked at by lundman for possible integration.
I'd be surprised if it doesn't. AVX is really good at speeding up compression/checksumming algorithms, and AESNI is standard in most AES implementations nowadays.
Because not every opcode is made public. The "usual suspects" for SIMD and encryption are public, yes, but nothing stops Intel from adding opcodes so highly specialized that they essentially represent the exact program code of the filesystem.
ZFS gobbles RAM and almost certainly couldn't be made to run acceptably on the Apple Watch. No, this seems like something developed from scratch to meet their particular needs.
ZFS needs very little memory to run. Performance is definitely better with more RAM, but the overwhelming use of memory in ZFS is for cache. Eviction is not particularly efficient due to the cache being allocated from the SLAB allocator, but that is changing later this year.
Getting ZFS to run on the Apple Watch is definitely possible. I am not sure what acceptably means here. It is an ambiguous term.
I've ran the initial FreeBSD patchsets for ZFS support on a dual-p3 with 1.5g ram, so 1g for a recent version should be more than doable. ZFS on FreeBSD has become a lot better with low-memory situations.
There are two additional things to consider.
ZFS uses RAM mostly for aggressive caching to cover over both spinning disks and the iops tradeoff vdevs make over traditional raid arrays. Thus low memory is not such a big deal if you have a pool with a single SSD or NVMe device.
The other point to consider is that on at least any non-Solaris derived platforms, the VFS layer does not speak ARC. So data is copied from an ARC object into a VFS object, taking up space in both. If you are able to adopt your platform to use the ARC as direct VFS cache, you can save RAM that way as well.
The wiki page should be corrected. Saying "lots of memory" is somewhat ambiguous. If this were the 90s, then it would be right.
As for the recommended amount of system memory, recommended amounts are not the minimum amount which code requires to run. It in no way contradicts my point that the code itself does not need so much RAM to operate. However, it will perform better with more until your entire working set is in cache. At that point, more RAM offers no benefit. It is the same with any filesystem.
ZFS de-duplication will eat all of the RAM you can throw at it.
Otherwise, it is basically the SLUB memory block allocator that was used in the Linux kernel for a while. So yes, it can run on watchOS-level amount of RAM.
ZFS data deduplication does not require much more ram than the non-deduplicated case. Performance will depend heavily on IOPS when the DDT entries are not in cache, but the system will run slowly even with miniscule RAM.
Kernel memory on the platforms where ZFS runs is not subject to swap, so something else happened on that system. The code itself is currently somewhat bad at freeing memory efficiently due to the use of SLAB allocation. A single long lived object in each slab will keep it from being freed. That will change later this year with the ABD work that will switch ZFS from slab-based buffers to lists of pages.
If dedup is off, and max ARC size is limited, it will use a little memory (e.g. 512 Mb of RAM for 2x2TB RAID1 pool). I can say that from my own experience, I tried both approaches.
I probably should clarify that the system could definitely run unacceptably slow when deduplication is used and memory is not sufficient for the DDT to be cached. My point is that saying ram is needed is saying that the software will not run at all, which is not true here.
Or when the cache is cold. It REALLY hurts to reboot while a deferred destroy on a big deduplicated snapshot is in progress. No import today for you!
Well, unless your medium has no seek penalty, which is what hurts with deduplication. Dedup on SSDs is pretty much OK, as long as your checksum performs reasonably (skein is reasonable; sha256 is not).
DDTs that fit inside no-seek-penalty L2s don't hurt that much either, and big DDTs on spinny-disk pools are acceptable with persistent l2arc, although it's risky because if the l2 fails, especially at import, you can have a big highly deduplicated pool that isn't technically broken but is fundamentally useless if not outright harmful to the system it's imported (or ESPECIALLY attempting to be imported) by. "No returns from zpool(1) or zfs(1) commands for you today!"
When eventually openzfs can pin datasets and DDTs to specific vdevs (notably ones made out of no-seek-penalty devices), heavy deduplication on big spinny disk pools should be usable and reliable.
Until then, "well technically even if you have only ARC and it's very small, it will work, just slowly" while correct in the normal case, is unfortunately hiding some of the most frustrating downsides when things go wrong.
That's if you're running deduplication (which is generally considered pointless for general purposes, it works very well for some data-loads but you really need to bench it beforehand considering its cost)
> Interesting, because on Reddit's /r/DataHoarder they recommend a "1GB RAM per terabyte of storage" rule of thumb.
The author meant deduplication, but that recommendation is wrong. A rule of the form "X amount of RAM per Y amount of storage" that applies to ZFS data deduplication is a mathematical impossibility.
You could could need as little as 40MB of RAM per TB of unique data stored (16MB records) or as much as 160GB of RAM per TB of unique data stored (4KB records), both assuming default arc settings. Notice that I say unique data and discuss records rather than simply say data. There is a difference between the two. If you want to deduplicate data and want to maintain a certain level of performance, you will want to make sure RAM is sufficient to have a relatively high hit rate on the DDT. You can read about how to do that in my other post:
It is not stragihtforward and it depends on knowing things about your data that you probably do not. There is no magic bullet that will make data deduplication work well in every workload or make deduplication easy to calculate. However, if the data is already on ZFS, the zdb tool has a function that can figure out what the deduplication ratio is, provided sufficient RAM for the DDT, which makes it impractical to run it on a large pool relative to system memory.
ZFS' data deduplication is a very strict implementation that attempts to deduplicate everything subject to it against everything else subject to it and do so under the protection of a merkle tree. If you want it to do better, you will have to either give up strong data integrity or implement a probabilistic deduplication algorithm that misses cases. Neither of which are likely to become options in ZFS.
Anyway, deduplicating writes in ZFS is IOPS intensive, which is the origin of poor performance. There are 3 random seeks that must be done per deduplicated write IO. If the DDT is accessed often, it will find its way into cache and if all of those seeks are in cache, then your write performance will be good. If they are not in cache, you often end up hitting hardware IOPS limits on mechanical storage and even solid state storage. That is when performance drops.
If you are writing 128KB records on a deduplicated dataset on hardware limited to 150 IOPS, you are only going to manage 6.4MB/sec when you have all cache misses. If your records are 4KB in size, you will only manage 200KB/sec when you have all cache misses. However, ZFS will continue to operate even if every DDT lookup is a cache miss and you are hitting the hardware IOPS limit.
The 1Gb per Tb rule of thumb is to handle the larger requirements of ARC caching, not the FS itself.
It's a performance guide, not a requirement.
The Gb/TB rule of thumb exists so that on-disk files can be moved around or stored in RAM before a write operation, and that open or recent files can be precached in RAM while being streamed, more than it is relative to the pools total capacity.
For things like compression, hashing, encryption, block defragmenting and other operations, ZFS uses a lot of caching and indexing to avoid bottlenecks.
If Apple decides to implement APFS on RAID at a software level ie to combat bitrot or to sell consumer /business NAS / SAN ie upscaled Time Machine services for VMs, there's going to be questionable setups and comparisons to FreeNAS, synology, unRAID and other software storage options with a mixture of technology and hardware, where apple won't be flexible or adaptive.
To argue for ZFS, requires understanding more about ZFS usage and performance scenarios.
It is very possible to run ZFS RAID Z1 on 2gb or less for even a 32tb pool, ie anyone is able to run 5x seagate 8tb SMR archive drives in RAID Z1, on 2gb RAM.
It is usable. FreeNAS regularly hosts builds on less than optimal hardware setups,
It's also usable with 4gb, 8gb, 16gb, or 32gb RAM, with varying % performance benefits as features are enabled and cache is expanded to handle storage of ARC or LRU (recently used) files/pages/blocks.
Usually, ZFS metric is measured in throughput when empty, to 90% full, and performance changes drastically under these conditions when cache is limited.
On a system like this with SMR "archive" drives the problem often is having a reliable cache of write data, and ideally, less fragments to store asynchronously, ie writing large files or modifying a large block is disk IO limited. If being used to store archives, up to and including for media files as a consumer device would, an optimal RAM size would be hard to guess, given that people might store bluray or UHD ISO files of ~40gb versus DVD's of 4-9gb, and streaming read/write of linear files would not use significant random iops.
With DB or VM storage, and consistent file blocks being written, the use case and performance requirements are just going to be different again, and this is where the 1gb per Tb rule is both useful and unhelpful for diagnosis of requirements.
ZFS has a lot of bottlenecks, usually CPU, RAM and IOPS, but people focus on RAM, since it is so much harder to expand or scale. And, it is not linear scale performance.
Regardless, it's just impossible to guess optimal use in a practical way since there's almost no caching at all under 2gb, the ARC is very limited and kernel panics are possible when memory is not tuned or limited to avoid expansion, which then usually relies on CPU performance rather than disk performance.
At the high end of usage, performance can be managed by different methods such as L2ARC, ZIL, more RAM, more CPU, different pools, etc. Each with caveats and usually, non linear benefits.
Many NAS units that come with 2gb of RAM are capable of running ZFS, the problem is performance.
It's even possible to run ZFS on less than 1gb RAM, but it's not going to be reliable or predictable unless you restrict the conditions of usage, ie limiting max filesizes, restrictions on vdevs or iops, etc. It would require heavy tuning for optimal task usage.
Especially if you start to hit the maximum storage limits of the pool, performance can be brutal without caching features, lower than 100kb/s when the ARC is busy or unoptimised. Usually whatever the CPU can deliver from the drive IO without IO or file cache will be veeery slow on NAS level hardware, because traditional NAS isn't CPU bound.
Essentially, at the point where you can't start or run performance features, there's no benefit from ZFS or CoW on smaller embed devices unless it is needed.
From memory, and experience, you can use half a gb per Tb of storage on Z1 storage with some caveats and have a usable performance, as long as you keep filesize and IO in mind.
With 4tb or larger drives, Z2 is recommended due to the outcome of a drive failure on the pool integrity, and just the rebuild /resilver times and error probability could allow data to be changed or corrupted during the resilver process.
This is just to combat entropy when reading Tb of data and creating new checksums due to the probabilities involved with magnetic storage. Current and future drive density almost guarantees that errors will occur with entropy and decay of magnetic storage.
With deduplication, it needs to store files with multiple hashes, caches per device, and pool, which conflates sizes (sic). About 5gb per Tb is a good start. in most cases, you would never require dedup as it has an extreme cost and usage case.
"FileVault: APFS volumes cannot currently be encrypted using FileVault.
"
This one is confusing, because this is a logical volume feature. Not sure how or why APFS would ever care that some layer above it is encrypting stuff.
On the one hand, that may be an artificial limitation. If this turns out to have some bug that overwrites the encryption keys, they could of course say "we warned you", but it would not be good PR. Also, if beta developers report intermittent smaller data-losing bugs, they might want to study the affected drives to see what went wrong with it. Not having encryption enabled on them will make that a tiny bit easier.
On the other hand, if the new ability to partition drives with flexible partition sizes includes separate encryption keys per partition, and encryption/decryption is done by the block driver, they may have work to do to keep that block driver informed about what blocks should get encrypted with what key.
> APFS supports encryption natively. You can choose one of the following encryption models for each volume in a container: no encryption, single-key encryption, or multi-key encryption with per-file keys for file data and a separate key for sensitive metadata. APFS encryption uses AES-XTS or AES-CBC, depending on hardware. Multi-key encryption ensures the integrity of user data even when its physical security is compromised.
Could be for performance reasons rather than a technical blocker. That said AES-NI instructions do make encryption pretty fast so it's anyone's guess at this stage.
> It is optimized for Flash/SSD storage and features strong encryption, copy-on-write metadata, space sharing, cloning for files and directories, snapshots, fast directory sizing, atomic safe-save primitives, and improved file system fundamentals.
btrfs was available in the Linux kernel in 2009, but didn't see production release in the tinkerer-friendly distros until 2012 and in an enterprise distro until 2015. These things take time: filesystems need to be absolutely bulletproof, especially in the consumer space where (unlike Linux) most users will have no idea what to do if something goes wrong. I'd say Microsoft is still on schedule.
Just yesterday, I did my first Linux installation with ZFS as the root/boot filesystem (Ubuntu 16.04). This is after using it as the default filesystem on my FreeBSD systems for several years, and being very happy with it.
I've used Btrfs many times since the start, and been burned by dataloss-causing bugs each and every time, so I'm quite cautious about using or recommending it. I still have concerns about its stability and production-readiness. If in doubt, I'd stick with ext4.
I've had F2FS on an Android tablet for many years. Resurrected it. However I'm running Debian on my laptop and I'm scared to try f2fs on
/ Because i get warnings about it being not fully supported "yet" i would love to have an SSD optimized FS on Linux. Since AAPL will open source the release version, is it conceivable that AFS could replace ext4 as the default Linux FS?
Without getting stability and reliability correct don't bother with the other features. What good is it if the filesystem handles oodles of drives if none of them have any of the data you put on them?
I don't know about that. Apple switched processor architectures twice, and both times software written for the old arch ran on the new one. And when they replaced their entire operating system, they not only made it so you could still program against the old API—just recompile and go—they also made it possible to run the old OS inside the new one so you could still run apps that hadn't yet been recompiled.
And before that, when Apple made the 68k -> PPC transition in the mid-90s, they ran the whole system under a crazy, bizarre emulator that allowed for function-level interworking - 68k code could directly call PowerPC code, and vice versa. Early PowerPC systems were in fact running an OS that was mostly composed of 68k code; it wasn't until 1998 or 1999 (around the release of the iMac) that most of the Toolbox was ported to PowerPC.
In the past, nobody did a better job of backwards compatibility than Microsoft.
Lately, Microsoft is showing that they aren't afraid to break things in the name of progress. If W10 is indeed the last version of Windows, maybe that's okay.
But wasn't that as the expense of clarity for new developers? I remember a horrible graduation exam where I had to code in VisualStudio.
The Most harsh part was not coding or UI, it was determining which version of different window API had a remote chance to smoothly work together. (It involved DB drivers and data grids)
Perhaps. But I suspect there's a lot more extending and maintaining existing software than writing new software. For the former, backwards compatibility makes a huge difference.
OTOH Microsoft has a terrible track record of overpromising and underdelivering their next gen file system. I give Apple the benefit of doubt here. It is worded so that the limitations for the better part clearly sounds related to this being a preview release.
Control over hardware doesn't really buy you anything here. Just about any hardware can use any filesystem with, in the worst case, the requirement that you have a small boot partition using the legacy filesystem.
Interestingly with SSD storage devices, control of the hardware can help a lot more as it can become possible to categorize, fully explore and if needed, ensure a particular behavior of commands like TRIM. Other filesystems have the unenviable task of running on any random piece of storage you throw at it, including things where the firmware straight up lies, or the hardware delays non-volatility past the point the filesystem assumes (potentially producing data loss in a crash) or similar types of problems.
Anyway. Overall, I think it's safe to say hardware control doesn't make most of filesystem development much simpler or easier. But there's a few interesting places it arguably does!
That doesn't really change anything about the filesystem design. A storage device can fail to write data it claims to have because of damage as well as design defects. When that happens, a reliable filesystem will detect it and a less reliable filesystem will catch on fire.
It also doesn't help to control 100% of the built-in storage if anybody can still plug in $GENERIC_USB_MASS_STORAGE_DEVICE and expect to use the same filesystem.
Many filesystems exist that do not run on a "plain" read/write block device, because storage based on flash is more complicated than the old random-sector-access magnetic hard drives. See for example UBIFS and JFFS2 on Linux.
Having full and direct low-level control of on-board SSDs could very well be advantageous for performance and longevity of the flash on modern macbooks. Things like combining TRIM with low-level wear leveling etc.
Taking advantage of the differences between flash and spinning rust only requires that you know which one you're running on.
Moving the wear leveling code into the OS where the filesystem can see it is an interesting idea but why aren't we doing that for all SSDs and operating systems then?
(raw) flash and spinning rust are fundamentally different, because spinning rust drives provide a READ SECTOR and WRITE SECTOR primitive, while raw flash provides READ SECTOR, ERASE (large) BLOCK, WRITE (small) SECTOR primitives. Stuff like UBIFS do try to move the wear leveling code into the OS. But the big players like Windows' NTFS and Mac's HFS were originally designed for the spinning rust primitive, so I guess vendors of flash storage (SSD drives, USB sticks etc) had to deal with providing a translation layer to emulate the spinning rust primitives on top of the nand flash primitives. I'm sure various nand flash vendors have different characteristics / spare blocks / secret sauce / defects that are masked by proprietary firmware, and probably see a significant business advantage on keeping those secret. Even things like building smarts about how a FAT filesystem is likely to have heavy rewrites of the file allocation table compared to file contents, into the firmware for USB sticks where FAT is a likely fs, could prove an advantage. So being a single vendor behind the entire stack from the raw NAND flash memory to the motherboard it's soldered onto to the OS is likely very advantageous.
Apple's EFI firmware has an HFS driver built into it. The way today's macOS boots is the firmware reads the bootloader off the boot partition created these days on Core Storage installations, and the bootloader (more correctly OSLoader) is what enables the firmware pre-boot environment to read core storage (encrypted or not) and thus find and load the kernel and kext cache and then boot.
This is a developer release. It's hardly likely that Apple is sinking dev resources into evolving a new OS filesystem without planning on the bootloader and backup functionality also being in place for the final release.
Copy-on-write and snapshots are excellent building blocks for Time Machine, so much in fact that, when it was announced, I suspected it was because of ZFS (which, at the time, was being considered for OSX). It's very likely TM will be adapted to work on it (with about 20 lines of code)
It's long overdue, but having spent months developing an implementation of bootable snapshots on os x that works with HFS+, (http://macdaddy.io/mac-backup-software/) this kind of stings.
I'm not sure about snapshots, TM is made to work by just copying the directory structure over, and each backup is a fully functioning hierarchy. But cloning is definitely a big deal.
Time Machine uses hard links to create a duplicate directory structure without duplicating all the files themselves; only changed files need to be copied.
As I recall, HFS+ was explicitly modified to support directory hard links, which is less common in the Unix world, explicitly to support this feature.
TM also maintains a folder called /.MobileBackups to store temporary backups while your backup drive isn't connected. OS X also maintains /.fseventsd, a log of file system operations that TM can use to perform the next incremental, instead of having to compare each file for modifications.
it doesn't "just" copy the directory structure over each time, it creates the structure for each backup, any files that are changed get copied in, ones that haven't changed are just hardlinked to the existing copies.
A snapshot is a copy of the file (and its blocks) at a given point in time. Subsequent writes to it will happen to new blocks, leaving the ones connected to the snapshot undisturbed.
Well, consistency is important too in backups. So TM will probably make a snapshot and do the backup from there.
Avoiding moved files from later dirs to already processed ones etc.
"Can't be used as the startup disk" is not necessarily a strong limitation; with FileVault enabled, you start up from your Recovery partition anyway. Even if they lose FileVault, I would guess they'll keep the Recovery partition setup (since they've invested in it a bit as a pseudo-BIOS, for changing things like SIP.) So that image can stay HFS and hold your boot kernel, while "Macintosh HD" can just be APFS. Sort of like Linux with /boot on FAT32 and / in LVM.
Linux /boot tends to be on ext3 or ext4 on most distributions. Recently it's XFS on the server flavor of Fedora, CentOS, and RHEL. For openSUSE the default is Btrfs, /boot is just a directory.
The bootloader/bootmanager is what determines what fs choices you have for /boot. GRUB2 reads anything, including ZFS, Btrfs, LUKS, even md/mdadm raid5/6 and even if it's degraded, and conventional LVM (not thinp stuff or the md raid support).
Presumably as with today, you'll have the option. I don't have a strong opinion on case sensitivity of file names, but I suspect they'll keep it case insensitive by default. I think for the average non-technical user that two files, "MyFile.txt" and "myfile.txt", being different could lead to some confusion, and Apple historically has apparently considered that confusion unacceptable.
The average user is also confused by "MyFile.txt" and " MyFile.txt" being different, or "Proposal II" and "Proposal 2" being different, but filesystems aren't usually built around that. I don't think case sensitivity is special enough to get that sort of treatment.
More problematic is that many case insensitive hard drives would be copied into new machines and there would be millions of conflicts. Some utility would have to sit there and annoy people by asking them to make decisions.
I can see why there would be conflicts going case sensitive -> case insensitive, but I can't see why there would be conflicts going the other way. Am I missing something?
If you have a file named "MyFile.txt" and another system is looking for "myfile.txt", then it'll not be found and Apple will not let you rename it because it thinks it's a no-op. That's frustrating as hell.
Git init / clone on case-insensitive HFS+ sets `git config core.ignorecase true`, which can lead to confusing behaviour where it ignores a change in the case of a filename.
> The default is false, except git-clone(1) or git-init(1) will probe and set core.ignoreCase true if appropriate when the repository is created.
I think the parent meant the other way around too.
However, the transition between the case insensitive and case sensitive filesystems isn't going to happen overnight. People will be copying files around both ways for quite some time, so the insensitive -> sensitive case is still going to be a concern.
I think you have it backwards. If you try to expand an archive with FOO.TXT and foo.txt, what should happen if you're writing to a case insensitive file system?
$ touch HI
$ touch hi
$ ls
HI
So that's disturbing. Another problem is every software you can think of will be comparing two files case insensitively. Almost weekly I get burned by this.
Open up the various folders of Adobe's software (on Windows). The DLLs are a mish-mash of all lowercase and upper-lower mixes. Heck, open up System32; the DLLs there are definitely not case-sensitive capable (`kbd*.dll' being one example). In fact, I bet you there's at least one program on your computer that accesses the Program Files using `C:\PROGRAM FILES (X86)' instead of `%programfiles(x86)%'. In fact, even environment variables aren't case-sensitive.
For the end-user they could prevent duplicate different-cased file names in the UI layer (the Finder), instead of the file system. That would be a more appropriate place for it anyway.
And then some code using Unix APIs would create two files whose names differ only in case and the UI layer would choke. This is why spray-on usability is bad.
The UI already has to deal with that anyway because it supports case sensitive volumes. What exactly constitutes case is locale specific, it differs from one user to the next, that logic would be messy to have inside the file system.
That would likely be a hassle because you'd have to be consistent for all programs that ever save or read a file. As a result it has to be an OS-level thing at least, if not at the file system level. I don't have a huge preference (case sensitive or insensitive), I think it's not worth a religious war, but whatever the choice is, it should be completely transparent to understand what convention the system is using as a coder, and as a general user.
Steam on Mac does, or at least did last time I tried to use it on a case-sensitive partition. It's not that steam inherently needs case-insensitivity, it's that some of the main app mixes the case of files in the app from what is on disk. So without case-insensitive FS it cannot find some files. Stupid problem really.
Both, you and cuddlybacon are right. A long time ago, they worked under the assumption that the FS is case-sensitive, and all the games I installed back then had title-cased folder names. Gradually, Valve stopped caring about this, and my games stopped working. I had to go in and manually change some game folder names to lower-case. It then kept some small files under SteamApps and the downloaded games under steamapps. They have fixed that now. Now, I have both CONFIG and config in my Steam folder.
How would it work? By a combination of magic and "we can't be bothered; the users should figure out something".
> Case Sensitivity: Filenames are currently case-sensitive only.
First thought: they have seen the light!
A moment later: wait...they consider this a "limitation", and it's only "currently" the case. So maybe they're going to perpetuate the brain-damage anyway.
It pushes a localization and UI problem down into the filesystem layer. Case-insensitivity is pretty easy for US-ASCII, but in release 2 of your filesystem, you realized you didn't properly handle LATIN WIDE characters, the Cyrillic alphabet, etc. In release 7 of your FS, you get case sensitivity correct for Klingon, but some popular video game relied on everything except Klingon being case-insensitive on your FS, and now all of the users are complaining.
How do you handle the case where the only difference between two file names is that one uses Latin wide characters and the other uses Latin characters? This one bit me when writing a CAPTCHA system back in 2004. (Long story, but existing systems wouldn't work between a credit card processing server that had to validate in Perl, and a web form that had to be written in PHP, where the two systems couldn't share a file system. It's simple enough to do using HMAC and a shared key between the two servers, but for some reason, none of the available solutions did it.) I noticed that Japanese users had a disturbingly high CAPTCHA failure rate. It turns out that many East Asian languages have characters that are roughly square, and most Latin characters are roughly half as wide as they are tall, so mixing the two looks odd. So, Unicode has a whole set of Latin wide characters that are the same as the Latin characters we use in English, except they're roughly square, so they look better when mixed with Unified Han and other characters. Apparently most Japanese web browsers (or maybe it's an OS level keyboard layout setting) will by default emit Latin wide unicode code points when the user types Latin characters. Whether or not to normalize wide Latin characters to Latin characters is a highly context-dependent choice. In my case, it was definitely necessary, but in other cases it will throw out necessary information and make documents look ugly/odd. Good arguments can be made both ways about how a case-insensitive filesystem should handle Latin wide characters, and that's a relatively simple case.
Most users don't type names of existing files, exclusively accessing files through menus, file pickers, and the OS's graphical command shell (Finder/Explorer). So, if you want to avoid users getting confused over similar file names, that can be handled at file creation time (as well as more subtle issues that are actually more likely to confuse users, such as file names that have two consecutive spaces, etc., etc.) via UI improvements.
Just saw a comment in another thread, stating Apple had slipped in improving their Unix layer, and here comes this. New file system is not a joking matter: Microsoft failed to deliver their new FS; Linux took years to go from ext2 to ext3 to ext4, and btrfs is in forever testing; most of *BSD still use their old ones; zfs took decade to become mainstream...
The information currently is very scarce on this one, but I hope they would at least test it REALLY WELL.
And Linux has the blessing/curse of having a million distros, each implementing their installer a little differently, exposing subtle bugs and inconsistencies in the FS/bootloader department.
And AFAIK, GNU Grub hasn't done a release in about four years now, and all the distros are using their own, custom beta build of it. It's a bit of a mess.
Many distros didn't support other file systems (JFS, XFS, et al back in the day) officially, but you could usually find a way to use them as boot or a forked installer.
I realize new file systems are difficult, but HFS+ is just an ancient mess that's needs to be replaced for a long while. This isn't new and innovative so much as finally getting around to removing technical debt and catching up with the rest of the world.
Windows and WinFS is a bad comparison. WinFS was just a tagging/metadata system on top of NTFS with a SQL storage backend. We're still quite far from the ability to tag files with custom meta data and have it easily to query using default file chooser dialogues.
I'm going to be very cynical and say that based on their track record of major OS overhauls I am _not_ looking forward to the bugs and issues that will sneak past their QA. I still have nightmares of all the bugs with their wifi & USB stack changes in the last couple OSX releases. Please Apple, for your own good don't 'move fast and break things' with the filesystem. At the very least it's time for everyone to make sure they have a good backup system in place before touching this thing.
While I am happy that Apple have at last committed to replacing HFS+, I'm wondering why they didn't use ZFS rather than reinventing the wheel. It's not like it's a particularly easy wheel to reinvent either; the amount of effort which goes into a filesystem like ZFS is non-trivial. Why not build on top of that?
I would have greatly appreciated being able to use ZFS with MacOS X, for datasets, snapshots, sending them to remote pools for backup etc. It would have made it directly interoperable with a lot of pre-existing and cross-platform infrastructure. (I couldn't care less if it didn't scale down to the "watch". Filesystems are not a one-size-fits-all affair.) I find it great that I can take a set of disks from e.g. Linux, run "zpool export", pull them out, and then shovel them into a FreeBSD system, run "zpool import" and have the pool and datasets reassembled and automatically mounted. Perfectly transparent interoperability and portability. While Apple like to do their own thing, this is one place I would have definitely appreciated some down to earth pragmatism and re-use of existing battle-tested and widely used technology.
I think the barriers to getting ZFS into OS-X were probably legal, not technical ones. There's long history of the CDDL tripping up reasonable attempts to use ZFS and I doubt Apple would have wanted to proceed without approval and/or alternate licensing. Likewise Oracle has no real reason to let them do that without writing a check with lots of zeros on it.
There's also a decent chunk of what ZFS supports (particularly flexible volume pool management) that would be useless on almost every machine that Apple makes and sells. Your example of pulling a drive out of one machine and putting it in another is either impossible (soldered-on storage) or highly unlikely (user-serviceable SSD, but hidden behind a bunch of pentalobe screws) with their modern hardware lineup.
Many of those companies probably started using ZFS before Oracle acquired Sun. After Oracle acquired Sun the risks of using ZFS skyrocketed legally and Apple had the option of just not taking on that risk since they hadn't shipped it yet.
Given what we know about Oracle that was probably the right call.
I read the link and have a pretty good understanding about how computers and file systems work but can someone "explain like I'm a decently intelligent programmer" what is different about this file system? Thanks you.
This bit from 2012 by John Siracusa outlines everything “wrong” with the old file system, HFS+, and doubles as a sort of guide to what this new file system fixes.
Hopefully this will be the end of .DS_Store and the crazy unicode normalization issue that causes mixups when rsync-roundtripping a directory structure between ext4 and HFS! :)
- On Windows NTFS, filenames are "opaque sequences of WCHARs", and are thus "kind of" UTF-16 with no formally required normalization format. Windows itself tries to use NFC, but applications are free to use the Windows APIs to create a filename with anything they like. Since filenames are just sequences of 16-bit WCHARs, dangling surrogate pairs are allowed (and can break all sorts of code!)
- On Linux, filenames are opaque sequences of 8-bit characters. The only requirement is that a filename not contain either a slash or NUL character. No other formal specification exists, although "most" users these days use UTF-8. (However, you can and will find loads of filesystems with invalid UTF-8, usually because filenames are in one of the ISO encodings instead).
So the main thing I've observed is rsync'ing web files from a linux server to an osx laptop, and back to the linux server, ends up with a bunch of duplicate decomposed utf8 filenames on the linux server. It can be avoided with careful use of rysnc's "--iconv=utf-8-mac,utf-8" etcetera, but it feels super-unnecessary. Fix it already :D
PS: I've been using LC_CTYPE="whatever.ISO8859-1" and an ISO-8859-1 Terminal.app locale forever since I seem to keep dragging a bunch of legacy filenames around (having started on MS-DOS and FreeBSD 2.2) and ISO-8859-1 still seems to be the only locale that lets me "see the bytes" matrix-style instead of a random amount of "?" chars. Curiously, Finder.app seems to keep up very very well despite the odd encoding. Crossing my fingers the new APFS will act more like Linux.
PPS: Java is especially hilarious when launched with -Dfile.encoding=utf-8 as it is literally impossible to access some files from there.
The point of encrypting the base system image is mostly to make it part of the Secure Boot chain. rootless is a policy control to stop processes from modifying the OS from inside. An encrypted system image stops things with hardware access (but not the unlock key) from modifying the OS from the outside. You can trust any disk whose blocks you can decrypt with key X, to have been only written to by someone with key X.
You'd have to sit there and read the entire base system image before booting to verify that signature. But only have to decrypt a block when it comes time to read that block—with the ability to kernel-abort right then during the boot process if the block doesn't decrypt. (Or were you suggesting individually signing every block?)
You don't sign the whole image as a stream, and you don't sign every block. Recursion is your friend! You sign the Merkle tree root, check it once, and then check O(log n) hashes per block access. You can, of course, amortize the checking of the first several of those hashes as a further optimization that ties in easily with your caching layer.
There's no such thing as "the block doesn't decrypt" absent MACs/MICs or AEAD schemes -- encryption and decryption are just maps from N bytes to N bytes.
I'm not the one conflating them; it's the Secure Boot people who think this is a good idea. Full-Disk Encryption is the defined "OS" stage of the Secure Boot chain-of-trust today, acting as an "optimization" (heh) over signing disk blocks.
It's certainly more secure (indeed, it prevents replay attacks) to just keep a big block-hash table, update it when blocks change, and then hash that table and sign it on fsync—but it's costly in a few ways over just trusting unauthenticated encryption, and was even moreso five-to-eight years ago when Secure Boot was being formulated.
These days, you see a lot of wholly-signed read only OS images—the OSX recovery partition is signed; CoreOS signs its OS images; most firmware is signed; etc. But I don't expect the unauthenticated-encryption on most computers' read-write rootfs will be replaced by a signed-but-unencrypted filesystem any time soon—if just for the fact that consumers really seem to hate the idea of separate OS and data partitions, especially when the OS partition is "stealing space" they could be using for data. (The only thing I can think of that might finally kill this making the default install on some consumer-OS create a thin pool, such that an OS partition that only contains 5G of data only "steals" 5G of their "space.")
Or, y'know, authenticated encryption. Do any block-device cryptosystems support an AEAD mode yet? LUKS maybe?
> "You can trust any disk whose blocks you can decrypt with key X, to have been only written to by someone with key X."
No you can not. That is your sentence not from Secure Boot People.
Also I assume that APFS will support encrypted and unencrypted logical FS in the same space sharing FS instance. So the separate OS partition is just a logical FS which is unencrypted. - Which I meant in my original post.
GELI and the AES-GCM are authenticated. Not sure if GCM has equivalent properties to the GELI HMAC feature but probably good enough.
Wow! This makes me feel good, maybe Apple gets it that many of it's users use it's devices because they present proper UNIX desktop environment. And that environment needs to be cherished and it needs to evolve.
This is not a small thing. We had nice visual overhaul 2 years ago, now Apple needs to pick it up on under the hood level.
It would be really nice is there was a modern file system that "just worked" regardless of device. I'm tired of having exFAT/FAT being the only filesystem I can reliably use on multiple different OS-es painlessly, and even then I can't use it for all the functions of those OS-es (No time machine). Hopefully this will be open enough to enable that, though who knows how it will shake out.
Getting UDF to work with rewritable media, like usb sticks, is an unholy pain in the arse. It involves trying to remember an arcane combination of versions and feature flags to get the universal format to actually be universal. I managed it once, but after that, realized that the network was fast enough for most of what I needed and I tend to not have the four gig files that cause problems with FAT filesystems.
From what I understand, UDF is primarily targeted at one-time-recordable media. (It's used in DVDs, for example.) It's unclear how well it works for primary storage, but I suspect it's clumsy at best.
> You can share APFS formatted volumes using the SMB network file sharing protocol. The AFP protocol is deprecated and cannot be used to share APFS formatted volumes.
Much like other modern filesystems such as ZFS and BTRFS it supports multiple filesystems in a shared storage space, snapshots, copy-on-write and also has encryption built-in as a first class feature including multiple keys and per-file keys.
There are many limitations in the developer beta so this is clearly still very much a work in progress. Getting these file-systems right is traditionally difficult and can take years (see ZFS, BTRFS) so it will be interesting to see how well it does.
No snapshotting yet, can't boot, no migration tools for existing setups (time machine, filevault, fusion). It'd be interesting to see whether their permission changes introduced in 10.11 will travel further down the fs. Also, I don't see any mention of compression like lz4. Lots of work until release in 2017!
On the other hand, people (sample size: my family and friends) tend to fill up their drives with already-compressed video and photos, which lz4 and similar cannot really compress further.
Because you can upsell a larger iCloud subscription to your customers using the new built in automatic cloud archival function.
Then use compression server-side. $$$
lz4 really helps with anything that is compressible - if you store text or compressible data it can improve throughput up to 2-3x if not even more if it's not compressible it stops compressing early. On a modern CPU core compression/decompression throughput is in the range of multiple gigabytes/seconds.
Well, have you seen the comments on Reddit and MacRumors and the like? You'd think anything above "dark mode" and a new MacBook model would be over their heads.
Most people do seem to assume that the WWDC is an hour~ long commercial for everything Apple is going to do for the rest of the year.
I have a theory on this regarding consumer devices:
Checksumming drives a lot of support calls that would otherwise not happen. As long as the errors are in media files and not meta-data, most consumers are going to be oblivious to bit rot in their downloaded movies and photos.
Enabling checksumming is going to reveal a lot of errors that would otherwise be silently ignored (spoken from experience running a ZFS media server)
Just because you do checksumming doesn't mean you have to report those errors in a way that might frighten a non-technical user. How many regular Mac users sit there looking at the syslog all day?
That is insane! I just so strongly assumed that any new advanced filesystem would have checksumming, I read the entire document without realizing it is never mentioned.
It would be unbelievable if they really didn't have checksumming. Could it just be that they haven't documented it yet? That seems weird and unlikely, but... not was weird and unlikely as APFS not having checksumming.
Heck, if someone's going whole-hog on file integrity, I'd really like them to support the creation of files that are stored "expanded" by a fountain-code.
Rather than having to RAID-mirror every block on my disk, I'd like to be able to pick just some files and say "please store those ones slightly more redundantly, such that they're protected from bit-level disk errors—they're important."
something like zfs copies=n? I used this on a drive with tons of bad sectors and it worked flawless - ZFS even spreads the writes throughout the disk. However AFAIK this is a property of using merkle hash trees (just add multiple leaves) and it's probably unlikely that Apple went this way.
Yes, what's more difficult is implementing data checksumming without CoW. What you may not be used to in ZFS is a nodatacow option (per file or per filesystem) which does exist on Btrfs and it implies nodatasum (no checksums for data, metadata is still always checksummed and cow). Conversely, nodatasum does not imply nodatacow.
Checksumming makes no sense on modern hardware. Hard Drives and SSDs use CRCs already.
When a sector goes bad on a hard drive, the firmware will mess about retrying and altering the analog amplifiers to try to get the signal back. If it gets the data back, it might "recover" by moving the data to spare sectors. Without a CRC, the firmware would have no way of differentiating between a sector read correctly and one that is read erroneously.
Checksumming makes sense when the checksum accompanies a pointer to another block. i.e. a directory entry says subdirectory contents are found in block 1027 /with checksum 0xdeadbeef/.
If block 1027 is errantly overwritten due to a bug in the filesystem, the block driver, the DMA subsystem, or the device firmware, the FS will know as soon as it goes to fetch block 1027 that something went wrong. This is true even if the block is still internally consistent; perhaps the write was misdirected, another sector was incorrectly read, or a failure in the flash translation layer caused a newer write to be lost.
If there's any redundancy to the system, whether storing the metadata in multiple places on the disk, or on different disks in an array, the FS can then check all the others and, crucially, detect which version is correct.
At least with bigger SATA disks I see regularly checksum errors on btrfs/ZFS and a rising bad sector count. The disk usually replaces the sectors on the next write but when reading it returns either zeros or wrong data. It's nice to know what files are affected from bad sectors. It's not common but common enough to have it.
several hundreds of 4TB HGST enterprise in a cluster at university. Disks get replaced ASAP but I'm seeing the checksum errors often in the logfiles when a disk starts to collapse.
Btw: So far the Backblaze reliability numbers check out (<1% annual failure rate)
SSDs use CRC to deliver their probabilistic data storage approximation. FS level checksums are, among others, for when that fails.
FS level checksums also cover bitflips on your bus during the write phase (the drive will crc the already bad data) or during the read phase (you receive different data than your drive sent).
My No. 1 guess as to why it wasn't in the keynote was that Apple isn't sure when this will ship. I think they are hoping for a year from now, but a new filesystem is hard to pin down.
The keynote was for consumers and the press. The 2 PM PT Platforms State of the Union is for developers and will likely go into this more, and if not there is a session tomorrow specifically about it.
Still... Marketing fully-encrypted snapshot-featured filesystem with focus on SSD / Flash, would make much more impact than having "native-tabs" for apps on an event like that.
Anyway I'm not a specialist, just another IT guy, so maybe I'm too opinionated to argue more.
The press understands "apps" and "tabs". The press doesn't understand "filesystem snapshots".
Part of the reason Apple has historically gotten much better press from WWDC than most tech companies do from their big developer events is that they laser-focused their keynote on being Press-digestible and not full of incomprehensible tech terminology that bores non-geeks to tears.
As evidenced by this thread, the people "filesystem snapshots" can be marketed to didn't need it to be mentioned in the keynote to get them talking about it. Hence, using precious keynote time for it would have been a waste of marketing resources.
I don't think this is going to be a part of macOS Sierra as a standard feature, given that it has a release date in 2017 while Sierra will be out later this year. So it makes sense to not include it in a Sierra announcement.
I'm notoriously skeptical of anything from the local-filesystem world, and I'm no Apple fanboy, but this looks pretty cool. I look forward to hearing how some of it works, especially what's new about their version of COW and why end-to-end checksumming wasn't mentioned.
In theory (the practical side is tricky to get right):
- More resilience against data corruption for one due to copy on write and likely checksumming - so in theory power failure should not harm the filesystem and you can roll back to a last known state and only lose data for the last few seconds if at all.
- Snapshots, before installing updates you can snapshot the filesystem and roll back instantly. You could also create a readonly snapshot to have consistent backups. This should also allow seamless rollback of OS updates. There are also clones supported e.g. multiple diverging versions of a file tree - think something similar to git branches on a filesystem level.
- Native encryption at the filesystem level
- Possible better handling of metadata. Faster access to directories with lot's of files, more files on a volume without slowdown.
- Sparse files - you can create huge files instantly and fill them later, guess it's important for various types of software e.g. virtual machine images.
But as nice as these properties are in theory history showed that that in practice a lot can go wrong.
Well the issue was that was a closed file system owned by Microsoft and used on Linux for Flash Drives and SD Cards.
If you sue closed technology you get burned. I didn't have a problem with Microsoft doing what they did. We should have used an open file system for those devices. I personally have used ext2/rfs on my drives and it is a pretty light security system ;)
There's no such thing as "closed" or "open": something can be open source (copyright law), but patented (patent law), and it can be closed sourced, but not patented.
That's not really a useful statement. HFS+ has worked pretty well for many years and is really well tested. This will be a completely new filesystem, which means it will have lots of new code that can have bugs or incompatibilities with current apps. APFS sounds like it's going to be great, but I think not being HFS is a disadvantage rather than an advantage.
Come on. HFS+ has been a shame for 15 years now. It's been the laughing stock of the filesystem community for ages. They got the developer of BeFS on board to patch the thing until it may apes a modern filesystem for a couple of days at a time, and that's about it. It deserves to die an ugly death, strangled in a back alley without mercy. It's about as good as LINDOS/FAT32.
God, what a relief, we're almost done with this monstrosity of HFS+. Didn't you hear the collective sigh of anyone having to _work_ with this flea-market antique FS?
Technically under the covers HFS isn't anything great. But from a users point of view apple has extended it in ways that it hasn't materss to customers.
Now you've got me wondering what would be required to convince the OSX kernel to boot from an NTFS-formatted volume. It certainly has the drivers (even if they're normally set to only mount NTFS volumes read-only.)
HFS+ didn't lose any files, his hard drive did. He wouldn't necessarily be able to get them back with filesystem-level checksum either, but he would know they're gone faster.
BTW, disk images already provide checksumming, and so does authenticated encryption.
Even on Chrome desktop, if you turn on Mobile emulation and switch it to appear to be any Android device w/ phone display size, it fails in the same way - yet switch it to 'iPhone 5' or 'iPhone 6' and the same compact layout works fine. They're pointlessly sniffing and special-casing 'Android' user-agents and botching it completely.
Gruber, who usually has good inside sources, on Apple dropping ZFS[1]:
> Word on the street in Cupertino is that dropping ZFS wasn’t an engineering decision, but a legal one, and it might have had something to do with Oracle’s acquisition of Sun. I don’t know if it was a problem with the terms of the CDDL license, general distrust/dislike for Oracle, or what — only that the word came down from legal that ZFS was a no-go.
Apple has a notoriously paranoid pack of legal sharks, and they clearly felt uncomfortable with something about the ZFS legal situation.
The unofficial word is that Apple wanted Sun to sign an indemnification agreement, Sun had agreed to sign it on the day that Oracle's acquisition of them closed and Oracle refused to sign it.
You are correct this happened pre-oracle, but note that it was confirmed by Sun's spokeperson.
(Also note that the national labs, who have been continuing ZFS on linux, are immune to a lot of these issues. Oracle literally cannot get an injunction against them on patent infringement issues, so they are guaranteed they can always work on it)
Open-source is great, but if that source is covered by patents you're potentially liable for damages. Nobody would ever willingly expose themselves to that.
When Apple dropped ZFS, Sun was in the middle of a lawsuit with NetApp regarding ZFS. If I recall correctly, Apple wanted Sun to be on the hook in the event of any possible patents, but they balked. Apple isn't going to touch ZFS.
Apple says they intend to open-source the file system in the docs.
Open Source: "An open source implementation is not available at this time. Apple plans to document and publish the APFS volume format when Apple File System is released in 2017."
It says they plan to document and publish the volume format but nothing about the source code of the implementation. It being "not available at this time" doesn't necessarily imply that it will be available in the future.
Neither the format nor implementation of Core Storage is documented by Apple. Near as I can tell not even an fs magic number is documented, i.e an offset and a signature that makes it possible to identify Core Storage physical volumes.
My understanding was that FaceTime and iMessage weren't open sourced because of issues related to a patent troll (VirnetX IIRC). I'd be interested in hearing from others that are more familiar with the situation.
Or perhaps concern with having to deal with 3rd party iMessage/Facetime clients while at the same time trying to maintain a high level of security. Apple doesn’t want anything to tarnish the security reputation of these products.
Not only that, but apparently Steve Jobs decided to just say that on stage and while it had been discussed a bit it wasn't the plan and came as a total shock to the people involved.
> Apple has a history of making false promises in regards to open sourcing products
No, not true at all. iMessage was never promised, mentioned, or even hinted at being open source. Also, on the FaceTime from they said that the underlying protocols would be submitted to standards bodies as an open standard - that's a completely different things from open source. And the reasons that Apple didn't wind up doing that are well known (lawsuit from patent trolls).
Sounds like you have an axe to grind or only listen to people who do...
This is interesting because just yesterday I saw a discussion here on HN about how sad people were that Apple was doing nothing to improve OSX and was putting all the focus on iOS instead. Heh.
I had been nervous that if and when Apple finally got around to making a new file system, they would follow the trend and neither open source it nor document the on-disk format. HFS+ itself has always been open by virtue of being included in the open-sourced core xnu kernel, but Apple's CoreStorage volume manager (the basis for FileVault and Fusion Drive) and DMG disk image format have always lived in closed-source kernel extensions and are not officially documented; Microsoft has acted similarly with its NTFS, ReFS, and exFAT file systems and Storage Spaces volume manager.
The result, of course, is that nothing can access anything else's file system, at least well. Most of the formats I mentioned have been reverse engineered to some extent, but there are limits, especially when you're scared of corrupting data by misinterpreting the format and therefore limit your implementation to read access. If you want read-write access to NTFS from Linux or macOS, a filesystem which is old enough to drink, you have to rely on either a relatively slow FUSE implementation or some proprietary software. If you dual boot Linux on your Mac and want access to the Mac partition, make sure to turn off disk encryption (and hope you don't run into any evil maids) unless you want to use a read-only FUSE tool, and I hope you don't have a Fusion Drive because I don't think anything supports that at all (could be wrong).
To be fair, this situation cannot be blamed entirely on lack of openness. Lack of interest matters too; as I said, HFS+ has an open source implementation, yet Linux's hfsplus driver doesn't support journaling, a feature added in 2002. But if anyone tries to improve the situation, that implementation makes their job a lot easier: for one thing, since filesystem support is in the BSD-based portion of the xnu kernel, it may be possible to port the HFS implementation to other BSDs with relative ease - like the NetBSD rump kernel, which can run in userspace, from which FUSE support wouldn't be that hard... Alternately, if they chose the path of enhancing the native Linux driver, at least they could consult Apple's code to be sure they weren't missing any obscure corners of the format that could cause their driver to corrupt data.
Anyway, what we got with Apple File System is this:
> An open source implementation is not available at this time. Apple plans to document and publish the APFS volume format when Apple File System is released in 2017.
That's pretty good! Open source plus documentation would be ideal, but documentation is much better than nothing[1], and the text of that paragraph doesn't exactly rule out the possibility of the final release being open source too. (I haven't installed the beta yet, but am I right in guessing that APFS is implemented in a kext?)
While there's no guarantee, I hope that that documentation will also come with information about the Core Storage format APFS will usually be wrapped in, so that proper interoperability can be achieved. Rather than my fear coming to pass of one of the last vestiges of openness in proprietary OS storage formats disappearing, the situation may actually be improved compared to today. Maybe in a few years I'll be able to install Ubuntu on my Mac and have full access to the OS X partition out of the box. I can hope...
[1] whether it's better or worse than open source alone is debatable - source can be harder to understand, but it's also never wrong, unlike documentation...
I hope so, though I kinda miss the pervasive use of metadata as in BeFS. That was before its time and such a nice concept. Coupled with saved queries, it's unmatched to this day in mainstream, general purpose file system semantics.
I just do not understand rationale. File systems is hard stuff, instead taking something that is probably best file system around they decided for long and painful road. Note that they worked before on OpenZFS in Apple ....
Apple not only ignored first major consensus in Games industry it also blocked support on its devices. It also stuck with old OpenGL. This will make gaming on Apple to suck for many years to come.
Seconded. File systems are hard and specs need to be finalized. Plus the grandparent post reminds me of the old Slashdot meme: ``No wireless. Less space than a nomad. Lame.''
Any filesystem that doesn't cache writes will basically have that property, unless you're talking about unplugging while write operations are happening, in which case all bets are off.
The point of journaling/soft updates/COW/etc. is that all bets are not off; they're intended to prevent the fs from being corrupted in those sorts of situations.
Former Apple engineer here. Yes, it sounds snide on the surface, but there is a lot of truth to it. I actually remember those of us on the inside being rather a- and bemused by the "magic" qualities we and our output had suddenly acquired.
The trouble with snidely expressed truth is that the snideness ends up dominating the discussion more than the truth does, and these effects compound over time. Therefore it's important to focus on simply expressing the truth as one understands it, as indeed you did in your comment here.
This is one point where the discourse on a large internet forum differs from that of smaller, more cohesive groups. We have to be vigilant about this on HN because we're more vulnerable to those compounding effects. The discussion doesn't naturally right itself and return to interesting things.
I.e., a "unique copy-on-write design"
> Space Sharing
Basically, ZFS datasets.
> Snapshots
If those can be sent: Finally Time Machine done right.
> The AFP protocol is deprecated and cannot be used to share APFS formatted volumes.
Interesting.
> An open source implementation is not available at this time. Apple plans to document and publish the APFS volume format when Apple File System is released in 2017.
Yay!
Limitations: https://developer.apple.com/library/prerelease/content/docum...
Edit:
> encryption models for each volume in a container: no encryption, single-key encryption, or multi-key encryption with per-file keys for file data and a separate key for sensitive metadata
Nice. I hope they also include checksums for each block.
Famously missing, but not the hardest thing to add considering all the features above: Compression (which HFS+ supports!)