Can you point to any other APFS issues that were reported before this one?
If you have more information about the problem you encountered and how it implicates/interacts with APFS, please do link to it. Otherwise, bug reports via circumstantial evidence are, while not inherently false, certainly suspect.
This is not reportable. I got only a generic error or hanging system. I can't reproduce it. I don't know why it started and why it finished. Yet, it was almost certainly an apfs issue.
Even if I wanted to play, my priority was to get the work laptop usable again.
I could be wrong, but I believe the point is not that it did happen, but that this -could- have happened many times in the past and users just format/re-install without thinking about it.
That is because the likelyhood of undetected hardware failures given the layers and layers of ECC on the disks, links/etc manifesting itself as filesystem meta data failures rather than garbage in the middle of video/images/document streams/etc is really unlikely. Or the more likely case of the machine performance degrading due to read retry/ecc correction/retransmission making the machine appear to have severe performance issues long before it manifests as silent data corruption sufficient to eat the filesystem structure (its a fun excise to intentionally flip a few random bits on a hard-drive image (or in RAM)) and see if/when they are detected.
So, yes the first thing I think when I hear filesystem corruption is BUG! That is what the experience of tracking down a number of incidents in a large data storage application a few years ago taught me.
Can you be 100% sure it's not an APFS fuckup?
> Filesystem corruption is frequently silent, and every-time it happens customers don't get on the phone and send the disks to apple so that they can root cause the problem. Its quite possible this bug has happened an untold number of times before it happened to someone who went through the effort to reproduce and isolate it.
Edit: Thanks for the downvotes. If you disagree, please tell me why. Apple's deployment of APFS to iPhones was so flawless, most people probably still don't even know they did it.
- the encryption password hint leak ("that was Disk Utility, not APFS")
- APFS volume erasure issues ("also Disk Utility")
- Adobe, Unity (editor and games built with it), Steam, Source Engine, and FCPX crash, performance, and asset loss issues on APFS volumes, all of which went away when moved to HFS+ volumes ("those teams should have adapted their software during beta")
- performance and incompatibility issues with spinning-disk drives ("platters are bad, APFS is designed for SSDs")
- RAID kernel panics, even on supported RAID 0 configurations ("that's corecrypto, not APFS")
File systems fall into the category of software where a bug can have disastrous consequences. Even if the probability of a bug is small, the magnitude of the consequence means that the overall risk is still high. And the current quality of software coming from Apple is so bad that the probability is not low.
For myself, I'm not letting APFS near my systems for at least a couple more years.
iOS devices are extremely constrained in a number of ways that MacOS isn't - who knows how many other bugs have failed to surface because Apple thought their iOS test was a 'job done' moment.
For people doing enterprise work and backups it's been a nightmare - here's one backup vendor that's been tracking issues with high RAM+CPU usage for almost 2 years now . Early on if data reached over 2.0 TB, it would silently corrupt on certain cluster sizes and when deduplication was enabled.  Per the Veeam thread, the "fix" for  is only preventative, meaning that currently affected volumes will need to reformat entirely.
This doesn't excuse the APFS goofs, but silent data corruption and grinding servers to a halt just writing data to the system are pretty major show stoppers, never mind that ReFS can't be used for a host of every-day operations (i.e., it's a storage level solution, not really an every-day-driver style File System).
 - https://forums.veeam.com/veeam-backup-replication-f2/refs-4k...
 - https://blogs.technet.microsoft.com/filecab/2017/01/30/windo...
1. i had a filevault related corruption issue, the disk was eating itself up thinking it was encrypting... don't have the apple discussion link at hand.
2. time machine hidden snapshot, "disk full" issues. it's a major pita for me that is not possible to turn off local snapshots.
Try this terminal command:
sudo tmutil disablelocal
Anyway, I recently switched to Arch on a 2018 LG Gram and I'm not really missing anything. Battery life is great (8-12 hours of Firefox) and it has a quad core x64 processor for non-browser things.
Windows Ultrabooks are worth the purchase again.
Fits my needs as a Macbook Pro replacement. Will be running linux desktop on it too. Probably elementary OS.
I keep around one Win10 laptop for gaming, but I prefer Linux for any real development work.
1. An APFS volume's free space doesn't reflect a smaller amount of free space on the underlying disk
2. The diskimages-helper application doesn't report errors when write requests fail to grow the disk image
These are not even complex problems of the new format. It is just Apple forgot to have basic checks. It is like the root access with an empty password incident happened 2 months ago. Why these serious but basic problems happen? What is going on with Apple?
(2) is the real issue here.
According to TFA, HFS+ sparsebundles reflect the limitations of their underlying volume, while APFS sparsebundles do not. Seems clear to me that this is a bug.
If you built a new filesystem, competent software engineers will heavily test the corner cases. What happens when the fs runs out of space? What happens when the metadata store runs out of space? etc.
The original article mentions bugs that are pretty obvious cases to test. What precisely happens when you have a sparsebundle that exceeds the storage capacity of the containing volume? A PM needs to define what should happen and an eng needs to test that it does.
It's inexcusable that things like this aren't tested and is an organizational failure. This isn't some complex interaction of earbuds with watch and a cloud system. This is a very testable filesystem.
These things speak to organizational issues.
Even in the Linux work, thinks like btrfs, even though some distros consider it sable, is still treated with scrutiny. Back in the early 2000s, many Linux distros refused to install on XFS or JFS.
Apple's APFS rollout really does fell like it happened way too fast.
Like Ritchie, I go back to the days of when the Macintosh Operating System shipped on floppies and didn’t have pre-emptive multitasking or memory protection—everything ran in the same memory space. The entire system would crash pretty regularly due to INIT (system extensions) conflicts, for example.
I can count on one hand the number of times my Mac has kernel panicked over the last few years and I regularly run beta versions of macOS.
That’s a very low bar. At least Windows has the excuse of having to work with a bazillion drivers.
No, he quoted Sinofsky because he’s one of the few people in the world who understands what it’s like trying to operate at this scale, since he was at Microsoft during it’s heyday.
Corner cases that affect only .01 percent of the installed bases aren’t a big deal when you’re operating at a few million; it’s entirely different when it’s more than a billion devices.
A common issue with Apple lately.
It's literally one of the most critical components of an operating system. Bugs in the filesystem or disk utilities are not small things. They have the potential to be disastrous.
# diskutil verifyVolume /dev/disk0s2
Started file system verification on disk2s1 macOS
Verifying file system
Volume was successfully unmounted
Performing fsck_apfs -n -x /dev/rdisk2s1
Checking the container superblock
Checking the EFI jumpstart record
Checking the space manager
Checking the object map
Checking the APFS volume superblock
Checking the object map
error: btn: invalid key (210, 16)
Object map is invalid
The volume /dev/rdisk2s1 could not be verified completely
File system check exit code is 8
Restoring the original state found as mounted
Error: -69845: File system verify or repair failed
Underlying error: 8: Exec format error
A filesystem should be able to last for decades (HFS was designed thirty years ago); I regard not having checksums in a brand new filesystem an over-optimistic tradeoff.
Edit: macOS disk images do have a checksum of the whole image data though. The issue mentioned in the article seems to be caused by an oversight in the disk image helper app, rather than in the APFS filesystem itself.
It wasn't an oversight; it was a deliberate design decision.
Others replies have noted that maybe it was a performance issue, etc. But I think it's something much different. The real reason is that there is more downside than upside to reporting these errors.
Users would be very upset if iOS told them that there was a bad block in one of their precious selfies from last month. But they might not even notice or care about a few bad pixels in the image itself.
I'm just telling you what one big probable rationalization was for this decision. I'd personally want to know, but people on HN aren't "average" IOS users.
Thankfully, I had manually been setting my sparsebundles back to HFS+ on creation, because I saw no reason to make them APFS containers.
The Finder in general ends up basically being useless for me for similar reasons; I have dozens of random dependency files I don't even recognize pop up in "All My Files".
Anecdotally it certainly seems like indexing is slower on my dev drive than anywhere else, so I'm curious.
The biggest offender for me when I touch a lot of files is Dropbox. It seems to use a lot of CPU when, e.g., an Xcode update is being installed. I've read that they had to listen to events for the whole volume because the more specific APIs weren't giving them the data they needed, but you'd think they could fast-path the files that were outside their sandbox.
Is your dev drive a platter drive or SSD? I've found that the last few major releases of osx have big performance issues on systems with old-school hard drives. (Frequent beach-balling, etc.)
Next time I turned it on, I couldn't get past login screen (giving me forever beach ball).
I put the ssd inside my old MBP as slave to recover data.
The ssd was corrupted, most data gone, as in shown in Finder but couldn't be copied.
I googled for solutions, but it seems I'm the first to experience this.
Maybe your unclean shutdown forced the async conversion from HFS+ to APFS to become forced-synchronous? Try leaving the drive in the Hackintosh machine, spinning at the login screen, for a few hours. Maybe it’ll “finish.”
It was notable that this design decision because other modern fs refs,btrfs,zfs do feature additional integrity checks.
I guess the question would be if you suspect that Apple which is famous for marketing and ui are just smarter than the man centuries Microsoft, Oracle, Sun, have poured into filesystem research or if perhaps this is just a bad design decision.
"Explicitly not checksumming user data is a little more interesting. The APFS engineers I talked to cited strong ECC protection within Apple storage devices. Both flash SSDs and magnetic media HDDs use redundant data to detect and correct errors. The engineers contend that Apple devices basically don’t return bogus data. NAND uses extra data, e.g. 128 bytes per 4KB page, so that errors can be corrected and detected. (For reference, ZFS uses a fixed size 32 byte checksum for blocks ranging from 512 bytes to megabytes. That’s small by comparison, but bear in mind that the SSD’s ECC is required for the expected analog variances within the media.) The devices have a bit error rate that’s tiny enough to expect no errors over the device’s lifetime. In addition, there are other sources of device errors where a file system’s redundant check could be invaluable. SSDs have a multitude of components, and in volume consumer products they rarely contain end-to-end ECC protection leaving the possibility of data being corrupted in transit. Further, their complex firmware can (does) contain bugs that can result in data loss."
(sorry for the edits, I finally found the paragraph my memory was referring to)
There are plenty of other reasons not to checksum user data, as it's a choice many have made, but that they trust the disk is an invalid argument.
The latter may be more prevalent on the hacintosh due to simply being a different hardware environment. A disk controller driver variation, or even having 2x as many cores as any apple products might be enough to trigger a latent bug.
So basically, I would be willing to bet that the vast majority of data corruption is happening due to OS bugs (not just the filesystem, but page management/etc) with firmware bugs on SSD's in a distant second. The kinds of failures that get all the press (media failures, link corruption, etc) are rarely corrupting data because as they fail the first indication is simply failure to read the data back because the ECC's cannot reconstruct the data and simply return failure codes. Its only once some enormous number of hard failures have been detected does it get to the point where a few of them leak through as false positives (the ECC/data protection thinks the data is correct and returns an incorrect block).
The one thing that is more likely is getting the wrong sector back, but overwhelmingly the disk/etc vendors have gotten smart about assuring that they are encoding the sector number alongside the data (and DIF for enterprise products) so that one of the last steps before returning it is verifying that the sector numbers actually match the requested sector. That helps to avoid RAID or SSD firmware bugs that were more common a decode ago.
And given you are running on a Hackintosh there isn't much anyone can do given the unsupported hardware.
> Note: What I describe below applies to APFS sparse disk images only — ordinary APFS volumes (e.g. your SSD startup disk) are not affected by this problem. While the underlying problem here is very serious, this is not likely to be a widespread problem, and will be most applicable to a small subset of backups.
1. update to high sierra
2. copy over files from old mac
3. record about 40gb of screen share data using quicktime (which is what he was doing when it crashed)
He spent hours on the phone with apple, the tech said he had never seen anything like it and they weren't able to recover his data... but after reading the other horror stories in this thread there seems to be some serious problems with high sierra and/or APFS.
This is a good lesson for everyone... obviously we all know to backup regularly (let Time Machine do its thing multiple times per day of course); but the lesson for me was when traveling, have everything you need backed up on a USB stick at least.
And yes, a high capacity USB stick. There are very small thumb drives that you can permanently leave in the USB-port of the laptop. Unfortunately I haven't found one like that with a USB-C connector.
That gave me even more vindication for my move. Also, you really don't know how fast your hardware is until you've used something other than macOS on it. From booting the system to launching software.. everything is snappier now.
Agreed re: snappiness of other OS. Ubuntu flies on my 2013 MBP.
The whole California-series of OS has had a broken Finder. I see some fixes in High Sierra, but its still buggy as heck for large file moves and broken scripting. I'm hoping they take a long hard look at Mac OS like they seem to be with the next iOS. I can forgive removing some UNIX commands, but the general bugs and unexplained crashes are starting to get on my nerves.
Until 10.13.3 I could barely use my MBP; horrendous graphical corruption issues. How this can happen I have no idea.
To many techies, version n-1 is the best thing ever created, until version n+1 comes out and the "best ever" shifts up one revision.
Few if any people glorified Panther when Tiger came out, or Leopard when Snow Leopard came out, or Win3.11 when Win95 came out, or Win95 when Win98 came out, or WinME when WinXP came out, or Vista when Win7 came out.
Stop dismissing legitimate complaints just because you worship "new and shiny"
Yeah, right after you stop dismissing important security updates as "new and shiny".
It didn't start out that way. It was heavily criticized in the first instance. Certainly until SP1 was released.
High Sierra has essentially no enhancements whatsoever.
Every few months, just as soon as I start thinking maybe I should give in to Apple’s incessant nagging and dark-pattern prompts to upgrade, stuff like this comes up.
I think I’ll just wait for the new release, or for a time when I’m ready to wipe any machine I want to upgrade.
I haven’t tested if file corruption is the consequence, too, of copying more data into the disk image than the underlying disk has free space.
Can anyone explain the "slightly less than" part of this? Why wouldn't it just be "equal to"?
"The image filesystem currently only behaves this way as a result of a direct attach action and will not behave this way if, for example, the filesystem is unmounted and remounted. Moving the image file to a different volume with sufficient free space will allow the image's filesystem to grow to its full size."
hdiutil has some of the best man pages I've ever run across.
That was the most interesting/worrying part of TFA, and I would love to see how the checksum tests were conducted clarified in the text.
Presumably, the "md5" commandline tool has no special fallback to the filesystem checksum cache (if it does, rather a lot of my life has been a lie, I'm afraid). Since that's the case, could we assume that, if the "lost" writes totalled $X GB of data, that any evil memory-caching of the file will only work in the presence of at least $X GB of free system memory (RAM plus swap).
I'd also be interested in learning what happens if there's less than that amount of memory available. Will the checksum fail? Will an error occur elsewhere? Will the system have some sort of memory (and swap) exhaustion failure/panic?
Seems to me we're all involved in a massive public beta.
If you're lucky, your customer notices that files are missing, understands that it has to be a bug in the operating system and maybe even has a rough idea what they were doing that caused the bug to occur, then calls up support and your first-level support is competent enough to direct the problem to the filesystem people and then it's still going to require a lot of luck for that department to reproduce the problem and to actually find out what in the code is causing it.
If anyone is curious: I use restic as backup client and Backblaze B2 as backup storage. Works well with sparse bundles.
On repeated backups, some backup softwares operate on file level and upload the whole file if it changed. So if you have a fixed size 50 GB image, mount it, add a file, unmount it, it changed, and the whole 50 GB image file has to be uploaded (with some backup softwares).
As I set up my new computer I'll move stuff or delete things out of the sparse disk image from my Desktop, then periodically reclaim space.
I only do this every 5+ years or so. I haven't done this with APFS, yet either. But those are a few of my use cases that dd wouldn't cover.
But now perhaps I better understand why Time Machine backups aren't supported on APFS.
What I describe below applies to APFS sparse disk images only — ordinary APFS volumes (e.g. your SSD startup disk) are not affected by this problem. While the underlying problem here is very serious, this is not likely to be a widespread problem, and will be most applicable to a small subset of backups. Disk images are not used for most backup task activity, they are generally only applicable when making backups to network volumes. If you make backups to network volumes, read on to learn more.
I didn't know there were APFS-formatted disk images (new in 10.13). Even when you consider the many different kinds of disk images that macOS supports, there's a pretty clear distinction between disk image and a backup of your startup disk, made to another partition in another drive.
Any additional clarification would get into "MacOS may lose data on APFS-formatted disk images (disk images, not disk-to-disk, as in another volume..." territory.
"may" lose data on "APFS-formatted" disk images.
I back up every day to my Synology NAS for example.
How do you like your Synology NAS? I’m considering it.
RAID (including the software RAID in Linux) doesn't actually do checksumming for file verification. AFAIK, ZFS is the only open system to do so.
The title is very clear, and the first paragraph which you quoted explains it in detail.
It's significant that an established respected company--the makers of Carbon Copy Cloner--will not support APFS formatted disk images for its backups.
While your agenda may be driven by the need to protect Apple, the rest of us need to know this important news about APFS so we can be fully informed.
People can have opinions without them needing to be flagged as having hidden agendas. And I have worked at Apple and can assure you that there isn't some payment structure for posting on forums.