Of particular interest (to me) was the "Checksums" section:
Notably absent from the APFS intro talk was any mention of
checksums....APFS checksums its own metadata but not user data.
...The APFS engineers I talked to cited strong ECC protection
within Apple storage devices. Both flash SSDs and magnetic media
HDDs use redundant data to detect and correct errors. The
engineers contend that Apple devices basically don’t return
It's hard for me to imagine a worse starting point to conceive a new filesystem than "let's assume our data storage devices are perfect, and never have any faulty components or firmware bugs".
ZFS has a lot of features, but data integrity is the feature.
I get that maybe a checksumming filesystem could conceivably be too computationally expensive for the little jewelry-computers Apple is into these days, but it's a terrible omission on something that is supposed to be the new filesystem for macOS.
Their filesystem goals are in some ways consistent with Apple's (marketing) vision: Users would never have terabyte libraries of anything, as the various iServices would (should) be hosting that stuff in the cloud (where one presumes it is stored on a filesystem that actually includes data integrity). Since users won't be storing much of anything locally, Apple needn't care too much about data integrity. This is of course, nonsense.
The idea that Apple's storage devices are error-free is arrogant--but even assuming that were true, there can still be bit errors in the SATA/PCI bus, errors in memory, race conditions, gamma rays, etc. Apple uses ECC memory on their Mac Pro, so obviously someone still believes that sort of thing is possible.
Literally nobody wants their files to be silently corrupted. ZFS made it much easier for (nerds like us) to attain very high levels of data integrity.
APFS was (and maybe still is?) a chance to make that the default for regular people.
With TB file systems, assuming you haven't outsourced everything to iCloud, data integrity matters. If you have, now you're trusting them not to screw up, ever.
From the movie or mp3 that mysteriously no longer plays, through to more important things - business data or family photos. I suspect many people have experienced bit rot, even if they don't recognise it as such. We've even reached a point where with quoted drive figures copying 2tb from one drive to another will likely result in a bit flip (source - Ars ZFS+btrfs article a couple of years back).
Heck, most people have some level of data loss from a HDD or flash drive fail. Sometimes even when they tried to do all the right things. Only question is whether it was backed up. In the case of personal users, unlikely. Self healing could have been quite some selling point!
I also happen to run a home file server on FreeBSD + ZFS, though I don't think that machine has ECC memory so it is still technically vulnerable to corruption.
And.. the red herring here is, Apple users will want to plug in third party storage. There's just no way to contain what someone will plug in to USB and ThunderBolt, and it's insane to think APFS would not be ready to help there.
For example, https://www.google.com/patents/US6289356 was filed in 1998, so I presume it's expiring fairly soon. Given that some of the original lawsuits were Network Appliance suing Sun/Oracle, I'm wondering how much of a role this played in the timing of the release of these features? After all, Apple could pretty much pick a window to release a new file system - nothing special about 2016, that they couldn't have done this in 2015 or 2017...
Which makes me wonder if there are data integrity patents that will expire, and at such time, Apple can now drop the functionality into APFS. After all, they did say during their presentation, that the flexibility of the data format is one of the key design features of APFS.
No idea if you're right, but it makes Apple's otherwise baffling stance plausible.
It's much easier to pretend that this is the case when the file system isn't verifying it.
Checksumming would probably expose problems that would otherwise go unnoticed by users or be blamed on computer gremlins. It's hard to say if doing the "correct" thing here would improve the subjective user experience. Maybe putting on airs of infallibility is the more profitable route.
Good checksumming to detect bit rot is exactly what is needed since as an owner of said laptop I have NO idea whether any of my data was affected.
If Apple want to say 'the majority of our devices are mobile and checksumming puts a large performance overhead' then that's one thing. But to claim it's not needed is just plain wrong and makes me worry that Apple's product managers sit in an echo chamber hearing only what they want to hear.
I guess that explains why my Mac recently had a bunch of daemons burning all cpu crashing repeatedly in a tight loop when getting sqlite errors on a db in ~/Library. Cause disk corruption never happens.
But on another level, I guess if hardware fails, then well, you buy more hardware, which is good for Apple. Presumably people who bought in the past from Apple won't turn around and buy an Acer or HP laptop. They'll still buy Apple.
It would be much nicer if your computer said, “I've detected a bit flip, please restore this file from backup”
I'll go further and say that even backup that forward-propagates corruption is not backup either - all the incremental backups from the moment of corruption are worthless. Bottomline: if your backup cannot be restored with integrity intact - it's not backup!
>>>> If your data gets silently corrupted and you keep backing up that corrupted data, eventually there won't be any backups left that have the original uncorrupted data.
>>> You shouldn't delete old backups!
>> Storage is too expensive to keep old backups forever.
> Backblaze! ... will delete backups after 30 days.
Yes. So 30 days after your file was corrupted, you will only have corrupted copies left.
I recommend sending every file system engineer on a year-long journey as a traveling system integrator.
same deal with some heap protections. say you are running a kernel which doesn't have byte patterns to detect heap overflows or reuse after free. maybe you have some heap overflows which because of their nature never cause any corruption but now you turn on heap protections and peoples kernels are getting more panics :/
If the fs detects a bit error does it flag the file as entirely unreadable? Move it to lost+found? Force me to restore the file from a backup? All these options seem more scary for an end user than blissful ignorance.
Don't misunderstand me, I've lost a few family photos over the years due to bit rot. So, I appreciate a fs that offers more protections. But, I honestly don't know offhand how an end user would recover from an error in /System or even an error in a family photo, or for that matter a word doc.
For files stored in iCloud Drive, if that version of the file exists in the cloud, the OS could automatically re-fetch the file. But, yeah, for lots of circumstances there's not going to be a "good" option to give the user.
EDIT: Same applies to Time Machine (or whatever Apple's backup solution will be called in the APFS era).
APFS has file level encryption, so you would in theory be able to detect a flip by selecting an encryption algorithm that gives error upon decrypting modified data. I could see this being worked into apps_fsck at some point.
A similar case could be made for adding it into the compression algorithm, which the OP thinks will be coming to APFS later, popular algorithms such as deflate already have this built in.
And in the Sun era you might be prepared to bet your business on not being on the wrong end of a lawsuit from the owner of the various patents and copyrights around Sun IP.
Only a completely insane person would argue that's a good idea now.
How can you have confidence in your backup if damaged data can be silently written to it?
Checksumming has another cost that isn't immediately obvious. Suppose you write to a file and the writes are cached. Then the filesystem starts to flush to disk. On a conventional filesystem, you can keep writing to the dirty page while the disk DMAs data out of it. On a checksumming filesystem, you can't: you have to compute the checksum and then write out data consistent with the checksum. This means you either have to delay user code that tries to write, or you have to copy the page, or you need hardware support for checksumming while writing.
On Linux, this type of delay is called "stable pages", and it destroys performance on some workloads on btrfs.
I would have hoped that a new filesystem with such wide future adoption would have come from a roomful of smart people with lots of experience of (for example) contributing to various modern filesystems, understanding their strengths and weaknesses, and dealing with data corruption issues in the field. This doesn't come across that way at all.
> With APFS, if you copy a file within the same file system (or possibly the same container; more on this later), no data is actually duplicated. [...] I haven’t see this offered in other file systems [...]
To my knowledge, this is what cp --reflink does on GNU/Linux on a supporting filesystem, most notably btrfs, and has been doing by default in newer combinations of the kernel and GNU coreutils.
This guy seems too well-informed and experienced in the domain to miss something so obvious, though. So what am I missing?
Also interesting to me is the paragraph about prioritizing certain I/O requests to optimize interactive latency: On Linux this is done by the I/O scheduler, exchangable and agnostic to the filesystem. Perhaps greater insight into the filesystem could aid I/O scheduling (this has been the argument for moving RAID code into filesystems as well, though, which APFS opts against) -- hearing a well-informed opinion on this point would be interesting. Unless this post gets it wrong and I/O scheduling isn't technically implemented in APFS either.
It seems like this perspective might be one written from within a Solaris/ZFS bubble and further hamstrung by macOS' closed-source development model. Which is interesting in light of the Giampaolo quote about intentionally not looking closely at the competition, either.
Maybe extending one of these existing filesystems to add any functionality Apple needs on top of its existing features (and, hopefully, contributing that back to the open source implementation) would cost more person-hours than implementing APFS from scratch. Maybe not.
Either way, we will now have yet another filesystem to contend with, implement in non-Darwin kernels (maybe), and this adds to the overall support overhead of all operating systems that want to be compatible with Apple devices. Since the older versions of macOS (OSX) don't support APFS, only HFS+, this means Apple and others will also have to continue supporting HFS+. It just seems wasteful of everyone's time to me.
What operating systems describe tremselves as being "compatible with Apple devices"
> Since the older versions of macOS (OSX) don't support APFS, only HFS+, this means Apple and others will also have to continue supporting HFS+.
Who else actually "supports" HFS+ ? Sure there are linux "ports" based on the spec but nobody claims them as being "supported".
Apple would have had to continue supporting HFS+ whether they chose to implement ZFS, btrfs or HAMMER.
>It just seems wasteful of everyone's time to me.
I don't know how Apple writing their own filesystem is wasteful of anybody else's time ( except possibly Apple's and/or Disk utility software for vendors for OS X)
The standard is the interface ( POSIX / SUS ) and unless APFS breaks that how is this applicable ?
I was referring to the Linux kernel modules implementing HFS+ and other Apple FSes.
> Who else actually "supports" HFS+ ? Sure there are linux "ports" based on the spec but nobody claims them as being "supported".
Yes, by support I meant other developers who want to be able to read and write to devices in APFS format.
> Apple would have had to continue supporting HFS+ whether they chose to implement ZFS, btrfs or HAMMER.
Yes, Apple would have to continue supporting HFS+, but other kernel developers would not have to port yet another filesystem (APFS) with all of its own quirks; and, who knows, maybe it would be less work for Apple to inherit ZFS/btrfs/HAMMER/some other filesystem's solutions to some of the same problems they're trying to solve from scratch here. My point was more that by reinventing the wheel to implement some of these features, they've created not just more work for themselves potentially, but more for the open source kernel development community as well in the long run.
> I don't know how Apple writing their own filesystem is wasteful of anybody else's time ( except possibly Apple's and/or Disk utility software for vendors for OS X)
APFS will find its way to external HDDs/SSDs/flash drives, etc., then in order to read those filesystems someone else will have to port it to any other devices/readers of that device/FS.
> The standard is the interface ( POSIX / SUS ) and unless APFS breaks that how is this applicable ?
I didn't mention POSIX, VFS, or filesystem _interfaces_. The analogy to the XKCD strip was that we already had N filesystems that have a large subset of (or in some cases superset of) the features of APFS as of right now, now we have N+1 complex filesystems to contend with and port and interoperate with in other kernels/OSes (mainly Linux + non-Darwin BSDs).
This may just be the price of progress, which is fine. I think it'll be fantastic if Apple makes progress in this area and improves upon the work of others. The developer seemed to be ignoring history so as not to "taint" himself (did he mean IP/legally tainted?), which is slightly worrying to me.
I hope Apple open sources their implementation under a BSD/GPL dual license to make it easier for others to port it directly into other kernels, rather than having to reimplement it themselves.
Apple has said from time to time that they're all about owning and controlling the key technologies that go into their products. APFS makes a lot of sense from that perspective, and this seems one of those cases where going their own way is better than importing someone else's constraints. ZFS on an Apple Watch? LOL.
One slide in the WWDC talk deck showed a bunch of divergent Apple storage technologies across all their platforms that are being replaced by APFS. If ZFS has to fork into weird variants to run well on the phone or watch, that seems less appealing than a single codebase optimized for just the stuff Apple products do.
I was reacting to the idea of APFS for macOS, as well as having yet another filesystem to deal with on external media that interacts with multiple computers (HDDs/SSDs/USB flash drives/etc.).
From Freebsd Mastery: ZFS Pg 135
"For a rough-and-dirty approximation, you can assume that 1 TB of deduplicated data
uses about 5 GB of RAM. You can more closely approximate memory needs for your
particular data by looking at your data pool and doing some math. We recommend
always doing the math and computing how much RAM your data needs, then using the
most pessimistic result. If the math gives you a number above 5 GB, use your math. If
not, assume 5 GB per terabyte."
Otherwise I think this is like the myth that ZFS requires expensive ECC ram whereas ECC ram is recommended for any filesystem and zfs needs it no more nor less.
ZFS, Pg 549
However it is not designed for or well suited to run on resource constrained systems using 32 bit CPUs with less than
8 Gbyte of memory and one small nearly full disk, which is typical of many embedded systems
What fs a laptop/workstation uses shouldn't be determined by whats suitable for a watch in 2016.
On the other hand, if Apple decides to open source the APFS implementation (hard to tell what their plans are from current statements, but I'm holding out hope), it'll probably be under a permissive license that allows porting to Linux. The implementation is in C (not C++) so porting is probably generally feasible. Compare to ZFS, which, even if some distros have finally started shipping it, will never quite be free of licensing issues unless Oracle does a 180.
Great article, but a couple nitpicking corrections (which seem appropriate for a storage article)
Per: https://en.wikipedia.org/wiki/Terabyte - Terabyte is 1000^4, not 1000^3.
Also It's been 6+ years since we all agreed that TiB means 2^40 or 1024^4, and TB means 10^12. Indeed, only in the case of memory does "T" ever mean 2^40 anyways. It's always been the case that in both data rates, as well as storage, that T means 10^12. This convention is strong enough that we most of us just have thrown up our hands and agree when referring to DRAM memory, that Terabyte will mean 1024^4, and 1000^4 everywhere else.
Indeed, in the rare case where someone uses TiB to refer to a data rate, they are almost without exception incorrectly using it, and, they actually mean TB.
No, it doesn't. APFS supports copying files, if you want that. It's just that the default in Finder is to make a “clone” (copy-on-write).
Is NTFS's shadow copy like Snapshots?
There are other FSes that allow the behavior that APFS is demonstrating - look at OCFS2 and Btrfs, both of which allow you to do cp --reflink.