I.e., a "unique copy-on-write design"
> Space Sharing
Basically, ZFS datasets.
If those can be sent: Finally Time Machine done right.
> The AFP protocol is deprecated and cannot be used to share APFS formatted volumes.
> An open source implementation is not available at this time. Apple plans to document and publish the APFS volume format when Apple File System is released in 2017.
> encryption models for each volume in a container: no encryption, single-key encryption, or multi-key encryption with per-file keys for file data and a separate key for sensitive metadata
Nice. I hope they also include checksums for each block.
Famously missing, but not the hardest thing to add considering all the features above: Compression (which HFS+ supports!)
> If those can be sent: Finally Time Machine done right.
If this ends up being true I can't wait—it's so frustrating watching tiny incremental backups take forever over the network. It seems like "Preparing" and "Cleaning up" take longer than moving the data.
While waiting for APFS to become stable, buy Carbon Copy Cloner. $40. I love it. It's fundamentally rsync, but tailored to OS X. For personal use a single license covers an entire household.
Every night CCC fires up on each laptop and each does an incremental clone to its own dedicated directory on my desktop machine. This clone is usually less than a few GB and takes about a minute to run. CCC doesn't have to be run daily, it can be told to run hourly instead.
Time Machine running on my desktop then copies everything off to yet another disk.
So my backup environment is:
each laptop uses CCC to periodically clone to desktop
the clones on the desktop are "traditional",
CCC is told not to keep its own copies of modified files
desktop runs time machine,
makes hourly backups to a TM disk,
old versions of files can be found there
1) on the laptop
2) on the desktop
3) on the desktop's Time Machine Volume
If you create your users in the same order on each machine, or later go back and change the User IDs to match (under advanced options in Users and Groups) then the uids will match everywhere, and all files can be easily browsed in each location, with all file permissions intact.
This was my epic "there is to kind of people those who lost data and those who will loose data" story. All my dad and mum files where gone, luckily nothing professional and no pictures (analog camera still ruled back then).
This is just a reminder that no tool is ever perfect. CCC is great, TimeMachine is sluggish but "dumbproof" to a certain extend (but you have to keep faith in a black box).
Personally I'm pretty paranoid so I perform my monthly off-site backup by rebooting my Mac on recovery partition and I perform a full disc copy to an external drive using DiskUtils (because ultimately this will be the recovery tools I'll use if things goes bad so I need to use my off-site backup...).
It really sluggish so I run it while sleeping. I get that for professionals with more than 1To you will need to get to some more serious stuff anyway.
From my perspective the backup was a success... This traumatizing error is when I first learned about superuser and permissions. I guess CCC has gone a long way since then and added more safeguard. But I must confess that I've sticked to a "use the goddam standard tools" since then.
Sure there must be a lot of more customizable or faster tools out there. But Time Machine stay "the backup tool my mum can't use wrong" I can't give enough to credit to Apple for that. And putting aside NSA/FBI problem, iCloud backup for iPhone is exactly as smooth. Literally every time I go to a Genius Bar, someone is about to hug a genius because of a successful restore from the iCloud of their broken/stolen iPhone.
This is how you build a "faithful" customer base and no other tech company understand that better than Apple so far.
If it's not file system permissions nor other configuration problems then it will be a failing storage medium or just dumb user error. So the only way to be sure you have a working back up is to test that back up - ideally before you actually need to use it.
I got a little lost while writing. But indeed my point was double check! Because in my case everything looked fine at first sight.
This is even why I don't use the same process for my two backup (daily/monthly). Because to be fair you can't constantly check that your daily/hourly backup isn't corrupted, you rely on its build-in safeguard and occasionally you check it.
But by using two different methods for daily vs monthly(offsite) backup, you significantly reduce the odds that the two different methods failed at the same time while you need them. (Also a monthly full clone is way easy to check than a Time Machine Disk)
Of course, the recovery failed. In fact, the backups hadn't been working properly for months. I lost a few hours of work; if our file server had actually failed, not having a good backup could have literally been catastrophic. As in would the company have survived?!
I recall an ancient quip, more or less: "if you don't test your backups, you don't have backups, you have dreams".
Here's a way to do it. You need to feel comfortable around the OS X terminal command line. You only rely on OS X's built in programs, so it's an independent way to check if your backup program is doing the right thing.
As root, I used to do something like this:
find -x \
-type f -print0 \
| xargs -0 -n 100 -x md5 \
| sort > /tmp/src.md5
There are some annoyances. E.g. (from memory) the ~/Library/Caches files aren't backed up, so they will be missing at the destination. I wrote some sed commands to first remove some of these from the result checksums to keep my diff's more manageable.
Instead of the above, I currently use a Python script that I wrote, that let me fine tune things. At the heart of it is (as root, of course) using Python's os.walk to traverse a directory tree. For each file I use hashlib.sha256 to generate a checksum. I also don't descend into certain directories. Etc.
Using a few Python scripts to generate and process the source and destination checksums allows me to, at the end, use a simple
vimdiff source.checksums destination.checksums
Keep in mind that the perfect is the enemy of the "good enough". Just using the basic 'find' (and not a custom Python script) is plenty good. I used that method for many years.
Many errors stick out right away. E.g. if you have 1,000,000 files checksummed in your source but only 250,000 in your destination, then you quickly know that you screwed up.
I use google docs for all my docs and spreadsheets, occasionally I use excel or word or keynote for files but if I do I save the docs to my dropbox or google drive folder, I have my photos synced with google photos, my music is synced through itunes match (or I can sync my music library files to dropbox, or use spotify), all my code is in git repos pushed to github or bitbucket.
What else is there to back up? Is it a matter of not wanting to re-install apps manually to get your computer back to its current state? Or is it a matter of not wanting to pay for the extra storage space on dropbox?
- Some others have been burnt by the cloud failing them (corruption, data loss, copyright abuse, etc) and want an additional layer of security.
- There are many activities that are fully not covered by the cloud, such as 3D / video / music edition, art in general, programming (all those dev envs)...
- You may want / need to be able to be able to recover from a dataloss even if you are offline.
- Some file types don't match the cloud sync paradigm very well such as system and configurations, non online game saves and all stuff that needs to be at a precise place on the system.
- You got stuff you want to manipulate as files in dirs, not as an entry into app. Power users usually dislike the loss of control and freedom the apps imply.
- You like to have 3 backup because of the rule "one on site, on off site".
- You like to have all your backups at the same place.
- You have collections off files that just don't fit in the cloud, such as Tera of Videos.
- Sex tapes still need to be backed up, and you won't send that to your dropbox.
- Some people still have a shitty internet connections and can't sync reliably.
- It's less work to manage one backup than to setup all the parameters of all those apps to be sure they sync only what you want.
- You read the licence for some sync plateform and couldn't decently click "I accept".
- You got a NAS at home on which you plug your hard drive with the backups for the whole familly.
For arguments sake, I'll focus Google services. As I personally discovered recently, docs offers minimal protection for your data:
1. Docs shared with you (others are "owner") can disappear without notice.
2. Manual clones are the only way to keep a copy of shared documents.
3. The GDrive agent keeps no local copies of docs, only urls.
4. Any deletions made >25 days ago are unrecoverable.
5. Deleted accounts have only a 5 day undelete window.
Similarly removal of items from trash in GMail/GPhotos is permanent. If someone maliciously gained access to your Google account (or one of your devices) they could quickly and easily purge your data. These deletions would quickly and efficiently propagate to all your devices and purge the canonical "cloud" copy.
Basically if you don't have any offline backups it is easy to become totally screwed. 2TB drives are cheap.
I've known so many people over the decades who trusted some online provider to reliably store the only copy of critical data, only to be burned. Even providers that you would have thought should be bulletproof.
My rule of thumb, based on data loss studies, is to have three copies of any critical data. Any cloud provider should only be considered to be one copy.
Documents and source code doesn't take up much space. Unless you want music and video files which iTunes can always sync for you to your Apple mobile device.
I use a USB hard drive to back up my Thunderbird and Firefox profiles. Also my downloads and other stuff too big for Dropbox. $100 for a 4T USB 3.0 hard drive is cheap.
One of the biggest is iPhone backups from iTunes. If I were a normal user and my MacBook didn't boot up tomorrow, I'd be surprised to find my backup didn't include critical files like this. For example, apps that are no longer in the App Store but are in your backup can be restored. Once you don't have that backup anymore, they can't.
It's just one of the reasons Apple users are forced to have multiple backup services if they want reliable and complete backups.
P.S.: The latest app binary files may not get updated/synced on the Mac post iOS 9/iTunes upgrade.
It would be nice to have a utility to give a definitive diff of what's on your hard drive that's not on your: Time Machine drive / [Carbonite, Backblaze, etc] backup.
That is really interesting. I recognised with El Capitan that it already defaults to SMB instead of AFP when you don't give a scheme.
While I imagine this might be retrofittable (netatalk manages it, after all), I can't blame Apple for wanting to ditch AFP. It's an ancient protocol at this point, and I'm frankly just amazed it still works at all.
HFS+ supports compression, so compression is not a new feature, so it's not on the New Features page. No problem!
Anecdotally, I'd always found AFP in the Tiger and Leopard days to be faster than whichever version of SMB support was included at the time. Now I use the default SMB3 and it seems that 802.11ac and gigabit are bottlenecks (of course its 10 years later in the times of SSD's as well)
And it wasn't just anecdotally faster; I worked for a storage company specializing in Mac workflows and AFP was empirically several times faster, especially on 1GbE and 10GbE networks. This was in part due to Apple ditching Samba in Lion/10.7 (http://appleinsider.com/articles/11/03/23/inside_mac_os_x_10...) over GPL concerns and replaced it with their own shitty, incomplete implementation, which they didn't get up to Samba's standards until 10.10.
I remember this being excruciating since customers had to either buy a third-party SMB implementation like DAVE to get any value out of 10GbE connections, or hope that the applications they wanted to use over the network supported AFP.
On-disk consistency ("crash protection"), snapshots, encryption, and transactional interfaces ("atomic safe-save") will no doubt be incredibly valuable. I don't think though though that APFS will dramatically improve upon the time it took ZFS to mature from a first product to world-class storage.
Some commenters have opined that (despite Apple distributing ZFS for Mac OS X at WWDC nearly a decade ago) that ZFS would never be appropriate for the desktop, phone, or watch. True ZFS was designed for servers and storage servers, but I don't think there's anything that makes it innately untenable in those environments--even its default, but not essential, use of lots of RAM.
Who knows... maybe Apple have spent the decade since killing their internal ZFS port taking this new filesystem though the paces. Its level of completeness though would suggest otherwise.
(That does leave me wondering how interesting an open-source APFS would be without an open-source Core Storage.)
hammer.ko is 494KB in the latest DFBSD, for those wondering the obvious.
It would be fantastic if they actively helped with HAMMER2 development.
Then Steve Jobs' buddy Larry bought Sun, and licensing of ZFS became basically impossible to sort out on time. So they dropped it.
Was that 2007 or 2009?
Either way, what file system didn't take years to get right? There are so many possible edge cases with file systems that it not only takes a long time to sort them out, but an amazing community and/or luck to reproduce deterministically to fix them.
Could speed up their time to market with such seasoned hands on board.
It will be interesting to see which license, if any, APFS is released under.
Sounds an awful like like ZFS (zero-cost clones, read-only snapshots) but could it be? I would imagine they'd start from scratch to due IP issues.
Clearly this is immature technology they want to get out for testing/evaluation before it's fully adopted even into their own products. (See below.)
- - - from the release notes - - -
As a developer preview of this technology, there are currently several limitations:
Startup Disk: APFS volumes cannot currently be used as a startup disk.
Case Sensitivity: Filenames are currently case-sensitive only.
Time Machine: Time Machine backups are not currently supported.
FileVault: APFS volumes cannot currently be encrypted using FileVault.
Fusion Drive: Fusion Drives cannot currently use APFS.
Being energy hungry relative to UFS and others is likely true due to things like checksum calculations and compression, but there is no way to implement these things without needing more cycles to compute them.
Not so true now - people have added encryption and compression instructions to CPUs. I'd be surprised if Apple couldn't ask Intel for a couple opcodes, and with the mobile platforms they do it anyway.
o3x builds and runs just fine with -O2 -march=native and the latest clang just by changing CC and CFLAGS; the kexts that get built aren't backwards compatible though (you'll get a panic if you build with -march=native on a machine that does AVX and run on a machine that doesn't).
The code that recent clang+llvm generates makes heavy use of the XMM and YMM registers, and does some substantial vectorization. The compression and checksumming and galois field code that's generated is strikingly better, although not quite as good as the hand tuned code in e.g. (https://github.com/zfsonlinux/zfs/pull/4439). It may be interesting to compare performance, but given that compression=lz4 and checksum=edonr has negligible CPU impact on a late 2012 4-core mac mini (core i7) even when doing enormous I/O (> 200k IOPS to a pair of Samsung 850 PROs), hand tuning likely won't make as much of a difference as moving up from compression=on, checksum=[sha256|fletcher4].
I'm pretty sure that once the hand tuned stuff is in ZOL it'll get looked at by lundman for possible integration.
Getting ZFS to run on the Apple Watch is definitely possible. I am not sure what acceptably means here. It is an ambiguous term.
> To use ZFS, at least 1 GB of memory is recommended (for all architectures) but more is helpful as ZFS needs lots of memory.
Is that inaccurate?
There are two additional things to consider.
ZFS uses RAM mostly for aggressive caching to cover over both spinning disks and the iops tradeoff vdevs make over traditional raid arrays. Thus low memory is not such a big deal if you have a pool with a single SSD or NVMe device.
The other point to consider is that on at least any non-Solaris derived platforms, the VFS layer does not speak ARC. So data is copied from an ARC object into a VFS object, taking up space in both. If you are able to adopt your platform to use the ARC as direct VFS cache, you can save RAM that way as well.
As for the recommended amount of system memory, recommended amounts are not the minimum amount which code requires to run. It in no way contradicts my point that the code itself does not need so much RAM to operate. However, it will perform better with more until your entire working set is in cache. At that point, more RAM offers no benefit. It is the same with any filesystem.
Otherwise, it is basically the SLUB memory block allocator that was used in the Linux kernel for a while. So yes, it can run on watchOS-level amount of RAM.
Well, unless your medium has no seek penalty, which is what hurts with deduplication. Dedup on SSDs is pretty much OK, as long as your checksum performs reasonably (skein is reasonable; sha256 is not).
DDTs that fit inside no-seek-penalty L2s don't hurt that much either, and big DDTs on spinny-disk pools are acceptable with persistent l2arc, although it's risky because if the l2 fails, especially at import, you can have a big highly deduplicated pool that isn't technically broken but is fundamentally useless if not outright harmful to the system it's imported (or ESPECIALLY attempting to be imported) by. "No returns from zpool(1) or zfs(1) commands for you today!"
When eventually openzfs can pin datasets and DDTs to specific vdevs (notably ones made out of no-seek-penalty devices), heavy deduplication on big spinny disk pools should be usable and reliable.
Until then, "well technically even if you have only ARC and it's very small, it will work, just slowly" while correct in the normal case, is unfortunately hiding some of the most frustrating downsides when things go wrong.
The author meant deduplication, but that recommendation is wrong. A rule of the form "X amount of RAM per Y amount of storage" that applies to ZFS data deduplication is a mathematical impossibility.
You could could need as little as 40MB of RAM per TB of unique data stored (16MB records) or as much as 160GB of RAM per TB of unique data stored (4KB records), both assuming default arc settings. Notice that I say unique data and discuss records rather than simply say data. There is a difference between the two. If you want to deduplicate data and want to maintain a certain level of performance, you will want to make sure RAM is sufficient to have a relatively high hit rate on the DDT. You can read about how to do that in my other post:
It is not stragihtforward and it depends on knowing things about your data that you probably do not. There is no magic bullet that will make data deduplication work well in every workload or make deduplication easy to calculate. However, if the data is already on ZFS, the zdb tool has a function that can figure out what the deduplication ratio is, provided sufficient RAM for the DDT, which makes it impractical to run it on a large pool relative to system memory.
ZFS' data deduplication is a very strict implementation that attempts to deduplicate everything subject to it against everything else subject to it and do so under the protection of a merkle tree. If you want it to do better, you will have to either give up strong data integrity or implement a probabilistic deduplication algorithm that misses cases. Neither of which are likely to become options in ZFS.
Anyway, deduplicating writes in ZFS is IOPS intensive, which is the origin of poor performance. There are 3 random seeks that must be done per deduplicated write IO. If the DDT is accessed often, it will find its way into cache and if all of those seeks are in cache, then your write performance will be good. If they are not in cache, you often end up hitting hardware IOPS limits on mechanical storage and even solid state storage. That is when performance drops.
If you are writing 128KB records on a deduplicated dataset on hardware limited to 150 IOPS, you are only going to manage 6.4MB/sec when you have all cache misses. If your records are 4KB in size, you will only manage 200KB/sec when you have all cache misses. However, ZFS will continue to operate even if every DDT lookup is a cache miss and you are hitting the hardware IOPS limit.
It's a performance guide, not a requirement.
The Gb/TB rule of thumb exists so that on-disk files can be moved around or stored in RAM before a write operation, and that open or recent files can be precached in RAM while being streamed, more than it is relative to the pools total capacity.
For things like compression, hashing, encryption, block defragmenting and other operations, ZFS uses a lot of caching and indexing to avoid bottlenecks.
If Apple decides to implement APFS on RAID at a software level ie to combat bitrot or to sell consumer /business NAS / SAN ie upscaled Time Machine services for VMs, there's going to be questionable setups and comparisons to FreeNAS, synology, unRAID and other software storage options with a mixture of technology and hardware, where apple won't be flexible or adaptive.
To argue for ZFS, requires understanding more about ZFS usage and performance scenarios.
It is very possible to run ZFS RAID Z1 on 2gb or less for even a 32tb pool, ie anyone is able to run 5x seagate 8tb SMR archive drives in RAID Z1, on 2gb RAM.
It is usable. FreeNAS regularly hosts builds on less than optimal hardware setups,
It's also usable with 4gb, 8gb, 16gb, or 32gb RAM, with varying % performance benefits as features are enabled and cache is expanded to handle storage of ARC or LRU (recently used) files/pages/blocks.
Usually, ZFS metric is measured in throughput when empty, to 90% full, and performance changes drastically under these conditions when cache is limited.
On a system like this with SMR "archive" drives the problem often is having a reliable cache of write data, and ideally, less fragments to store asynchronously, ie writing large files or modifying a large block is disk IO limited. If being used to store archives, up to and including for media files as a consumer device would, an optimal RAM size would be hard to guess, given that people might store bluray or UHD ISO files of ~40gb versus DVD's of 4-9gb, and streaming read/write of linear files would not use significant random iops.
With DB or VM storage, and consistent file blocks being written, the use case and performance requirements are just going to be different again, and this is where the 1gb per Tb rule is both useful and unhelpful for diagnosis of requirements.
ZFS has a lot of bottlenecks, usually CPU, RAM and IOPS, but people focus on RAM, since it is so much harder to expand or scale. And, it is not linear scale performance.
Regardless, it's just impossible to guess optimal use in a practical way since there's almost no caching at all under 2gb, the ARC is very limited and kernel panics are possible when memory is not tuned or limited to avoid expansion, which then usually relies on CPU performance rather than disk performance.
At the high end of usage, performance can be managed by different methods such as L2ARC, ZIL, more RAM, more CPU, different pools, etc. Each with caveats and usually, non linear benefits.
Many NAS units that come with 2gb of RAM are capable of running ZFS, the problem is performance.
It's even possible to run ZFS on less than 1gb RAM, but it's not going to be reliable or predictable unless you restrict the conditions of usage, ie limiting max filesizes, restrictions on vdevs or iops, etc. It would require heavy tuning for optimal task usage.
Especially if you start to hit the maximum storage limits of the pool, performance can be brutal without caching features, lower than 100kb/s when the ARC is busy or unoptimised. Usually whatever the CPU can deliver from the drive IO without IO or file cache will be veeery slow on NAS level hardware, because traditional NAS isn't CPU bound.
Essentially, at the point where you can't start or run performance features, there's no benefit from ZFS or CoW on smaller embed devices unless it is needed.
From memory, and experience, you can use half a gb per Tb of storage on Z1 storage with some caveats and have a usable performance, as long as you keep filesize and IO in mind.
With 4tb or larger drives, Z2 is recommended due to the outcome of a drive failure on the pool integrity, and just the rebuild /resilver times and error probability could allow data to be changed or corrupted during the resilver process.
This is just to combat entropy when reading Tb of data and creating new checksums due to the probabilities involved with magnetic storage. Current and future drive density almost guarantees that errors will occur with entropy and decay of magnetic storage.
With deduplication, it needs to store files with multiple hashes, caches per device, and pool, which conflates sizes (sic). About 5gb per Tb is a good start. in most cases, you would never require dedup as it has an extreme cost and usage case.
ATP will just be a concert of dings.
This one is confusing, because this is a logical volume feature. Not sure how or why APFS would ever care that some layer above it is encrypting stuff.
On the other hand, if the new ability to partition drives with flexible partition sizes includes separate encryption keys per partition, and encryption/decryption is done by the block driver, they may have work to do to keep that block driver informed about what blocks should get encrypted with what key.
> APFS supports encryption natively. You can choose one of the following encryption models for each volume in a container: no encryption, single-key encryption, or multi-key encryption with per-file keys for file data and a separate key for sensitive metadata. APFS encryption uses AES-XTS or AES-CBC, depending on hardware. Multi-key encryption ensures the integrity of user data even when its physical security is compromised.
Most apple volumes are already logical volumes (check diskutil list).
FileVault itself is, right now, implemented as part of corestorage (see diskutil cs for the encryption/decryption commands).
I assume they decided they just wanted to go the entire ZFS route and get rid of core storage in favor of a pool model, but still ...
It's wishful thinking, I know.
\o/ Hallelujah, something modern!
Hopefully this with change in the next macOs release.
I've used Btrfs many times since the start, and been burned by dataloss-causing bugs each and every time, so I'm quite cautious about using or recommending it. I still have concerns about its stability and production-readiness. If in doubt, I'd stick with ext4.
Stability, reliability: ext4/XFS
CoW, snapshots, multi-drive FS: ZFS/btrfs
SSD speed, longevity: F2FS
Lately, Microsoft is showing that they aren't afraid to break things in the name of progress. If W10 is indeed the last version of Windows, maybe that's okay.
The Most harsh part was not coding or UI, it was determining which version of different window API had a remote chance to smoothly work together. (It involved DB drivers and data grids)
Although ExFAT is at least somewhat promising.
Anyway. Overall, I think it's safe to say hardware control doesn't make most of filesystem development much simpler or easier. But there's a few interesting places it arguably does!
It also doesn't help to control 100% of the built-in storage if anybody can still plug in $GENERIC_USB_MASS_STORAGE_DEVICE and expect to use the same filesystem.
Having full and direct low-level control of on-board SSDs could very well be advantageous for performance and longevity of the flash on modern macbooks. Things like combining TRIM with low-level wear leveling etc.
Moving the wear leveling code into the OS where the filesystem can see it is an interesting idea but why aren't we doing that for all SSDs and operating systems then?
Why shouldn't we also demand standard low level primitives so that every OS can do the thing you're describing?
As I recall, HFS+ was explicitly modified to support directory hard links, which is less common in the Unix world, explicitly to support this feature.
TM also maintains a folder called /.MobileBackups to store temporary backups while your backup drive isn't connected. OS X also maintains /.fseventsd, a log of file system operations that TM can use to perform the next incremental, instead of having to compare each file for modifications.
The bootloader/bootmanager is what determines what fs choices you have for /boot. GRUB2 reads anything, including ZFS, Btrfs, LUKS, even md/mdadm raid5/6 and even if it's degraded, and conventional LVM (not thinp stuff or the md raid support).
> The default is false, except git-clone(1) or git-init(1) will probe and set core.ignoreCase true if appropriate when the repository is created.
However, the transition between the case insensitive and case sensitive filesystems isn't going to happen overnight. People will be copying files around both ways for quite some time, so the insensitive -> sensitive case is still going to be a concern.
$ touch HI
$ touch hi
$ touch HI
$ test -f hi && echo ok
> many case insensitive hard drives would be copied into new machines and there would be millions of conflicts
I still don't see where you get a conflict copying the contents of a case-insensitive file system to a case-sensitive one.
Because some apps create MyFile.txt and expect to be able to access it later by myfile.txt. Adobe's applications, for example.
How would it work? By a combination of magic and "we can't be bothered; the users should figure out something".
Btrfs is not stable enough IMO for something like SteamOS.
> Case Sensitivity: Filenames are currently case-sensitive only.
First thought: they have seen the light!
A moment later: wait...they consider this a "limitation", and it's only "currently" the case. So maybe they're going to perpetuate the brain-damage anyway.
Backwards compatibility is going to end up trumping whatever ideological purity case sensitivity represents.
How do you handle the case where the only difference between two file names is that one uses Latin wide characters and the other uses Latin characters? This one bit me when writing a CAPTCHA system back in 2004. (Long story, but existing systems wouldn't work between a credit card processing server that had to validate in Perl, and a web form that had to be written in PHP, where the two systems couldn't share a file system. It's simple enough to do using HMAC and a shared key between the two servers, but for some reason, none of the available solutions did it.) I noticed that Japanese users had a disturbingly high CAPTCHA failure rate. It turns out that many East Asian languages have characters that are roughly square, and most Latin characters are roughly half as wide as they are tall, so mixing the two looks odd. So, Unicode has a whole set of Latin wide characters that are the same as the Latin characters we use in English, except they're roughly square, so they look better when mixed with Unified Han and other characters. Apparently most Japanese web browsers (or maybe it's an OS level keyboard layout setting) will by default emit Latin wide unicode code points when the user types Latin characters. Whether or not to normalize wide Latin characters to Latin characters is a highly context-dependent choice. In my case, it was definitely necessary, but in other cases it will throw out necessary information and make documents look ugly/odd. Good arguments can be made both ways about how a case-insensitive filesystem should handle Latin wide characters, and that's a relatively simple case.
Most users don't type names of existing files, exclusively accessing files through menus, file pickers, and the OS's graphical command shell (Finder/Explorer). So, if you want to avoid users getting confused over similar file names, that can be handled at file creation time (as well as more subtle issues that are actually more likely to confuse users, such as file names that have two consecutive spaces, etc., etc.) via UI improvements.
The information currently is very scarce on this one, but I hope they would at least test it REALLY WELL.
And AFAIK, GNU Grub hasn't done a release in about four years now, and all the distros are using their own, custom beta build of it. It's a bit of a mess.
I realize new file systems are difficult, but HFS+ is just an ancient mess that's needs to be replaced for a long while. This isn't new and innovative so much as finally getting around to removing technical debt and catching up with the rest of the world.
Windows and WinFS is a bad comparison. WinFS was just a tagging/metadata system on top of NTFS with a SQL storage backend. We're still quite far from the ability to tag files with custom meta data and have it easily to query using default file chooser dialogues.
I would have greatly appreciated being able to use ZFS with MacOS X, for datasets, snapshots, sending them to remote pools for backup etc. It would have made it directly interoperable with a lot of pre-existing and cross-platform infrastructure. (I couldn't care less if it didn't scale down to the "watch". Filesystems are not a one-size-fits-all affair.) I find it great that I can take a set of disks from e.g. Linux, run "zpool export", pull them out, and then shovel them into a FreeBSD system, run "zpool import" and have the pool and datasets reassembled and automatically mounted. Perfectly transparent interoperability and portability. While Apple like to do their own thing, this is one place I would have definitely appreciated some down to earth pragmatism and re-use of existing battle-tested and widely used technology.
There's also a decent chunk of what ZFS supports (particularly flexible volume pool management) that would be useless on almost every machine that Apple makes and sells. Your example of pulling a drive out of one machine and putting it in another is either impossible (soldered-on storage) or highly unlikely (user-serviceable SSD, but hidden behind a bunch of pentalobe screws) with their modern hardware lineup.
Is it that Apple is a much bigger target? Patent issues?
Given what we know about Oracle that was probably the right call.
* best support modern hardware technologies,
* start from the state of the art in file systems (like ZFS and btrfs, old as they may be), and also
* emphasize security.
Google them and see what they do well, and you might get some ideas.
- On OS X HFS+, filenames are stored using a "variant" of NFD, where some characters are precomposed "for compatibility with old Mac text encodings" (https://developer.apple.com/library/mac/qa/qa1173/_index.htm...).
- On Windows NTFS, filenames are "opaque sequences of WCHARs", and are thus "kind of" UTF-16 with no formally required normalization format. Windows itself tries to use NFC, but applications are free to use the Windows APIs to create a filename with anything they like. Since filenames are just sequences of 16-bit WCHARs, dangling surrogate pairs are allowed (and can break all sorts of code!)
- On Linux, filenames are opaque sequences of 8-bit characters. The only requirement is that a filename not contain either a slash or NUL character. No other formal specification exists, although "most" users these days use UTF-8. (However, you can and will find loads of filesystems with invalid UTF-8, usually because filenames are in one of the ISO encodings instead).
Multiple programming languages have been bitten by the possibility of invalid Unicode in filenames (see for example: rust (https://github.com/rust-lang/rust/issues/12056), Python (https://www.python.org/dev/peps/pep-0383/)). This mess is pretty much never going to go away, either, because filesystems are extremely durable and long-lasting.
More details here http://serverfault.com/a/427200
PS: I've been using LC_CTYPE="whatever.ISO8859-1" and an ISO-8859-1 Terminal.app locale forever since I seem to keep dragging a bunch of legacy filenames around (having started on MS-DOS and FreeBSD 2.2) and ISO-8859-1 still seems to be the only locale that lets me "see the bytes" matrix-style instead of a random amount of "?" chars. Curiously, Finder.app seems to keep up very very well despite the odd encoding. Crossing my fingers the new APFS will act more like Linux.
PPS: Java is especially hilarious when launched with -Dfile.encoding=utf-8 as it is literally impossible to access some files from there.
And don't get me started on .DS_Store :P
With Space sharing and multiple logical volumes there will probably be a shift to encrypting users home directories separately.
Probably even an unencrypted System Base, which is read-only and protected by rootless anyway, so the system can boot without user interaction.
This also finally allows the most user friendly implementation of file encryption: Encrypted Folders, supported by the OS.
Sounds cool, Id be willing to help with a rust port once the source is available :)
(case sensitive only - naturally)
There's no such thing as "the block doesn't decrypt" absent MACs/MICs or AEAD schemes -- encryption and decryption are just maps from N bytes to N bytes.
Encryption is not authentication, so it does not prevent unnoticed modification.
And secure boot only really depends on authentication and not on encryption, so you are conflating two different concepts.
No you cant just trust the data, replay attacks over time and space are still an issue.
It's certainly more secure (indeed, it prevents replay attacks) to just keep a big block-hash table, update it when blocks change, and then hash that table and sign it on fsync—but it's costly in a few ways over just trusting unauthenticated encryption, and was even moreso five-to-eight years ago when Secure Boot was being formulated.
These days, you see a lot of wholly-signed read only OS images—the OSX recovery partition is signed; CoreOS signs its OS images; most firmware is signed; etc. But I don't expect the unauthenticated-encryption on most computers' read-write rootfs will be replaced by a signed-but-unencrypted filesystem any time soon—if just for the fact that consumers really seem to hate the idea of separate OS and data partitions, especially when the OS partition is "stealing space" they could be using for data. (The only thing I can think of that might finally kill this making the default install on some consumer-OS create a thin pool, such that an OS partition that only contains 5G of data only "steals" 5G of their "space.")
Or, y'know, authenticated encryption. Do any block-device cryptosystems support an AEAD mode yet? LUKS maybe?
No you can not. That is your sentence not from Secure Boot People.
Also I assume that APFS will support encrypted and unencrypted logical FS in the same space sharing FS instance. So the separate OS partition is just a logical FS which is unencrypted. - Which I meant in my original post.
GELI and the AES-GCM are authenticated. Not sure if GCM has equivalent properties to the GELI HMAC feature but probably good enough.
FreeBSD's geli supports authenticating data with HMACs.
Thank you for injecting sexism into a technical discussion.
This is not a small thing. We had nice visual overhaul 2 years ago, now Apple needs to pick it up on under the hood level.
AFP deprecated too.
There are many limitations in the developer beta so this is clearly still very much a work in progress. Getting these file-systems right is traditionally difficult and can take years (see ZFS, BTRFS) so it will be interesting to see how well it does.