
MacOS may lose data on APFS-formatted disk images - mpweiher
https://bombich.com/blog/2018/02/15/macos-may-lose-data-on-apfs-formatted-disk-images
======
torstenvl
Yes, fairly limited impact, but still. Data loss is fucking important. We
aren't talking about an occasional stutter when moving windows or a sound
driver that sometimes needs a reboot to resume working. This is a major bug. I
would really really like it if Apple would get their shit together so that
people who actually rely on their computers to work correctly can upgrade at
some point.

~~~
toasterlovin
As a counterpoint, APFS was deployed to almost a billion devices over the span
of several months when it was release and, IIRC, this is the first major
issue.

~~~
StillBored
How do you know that? Filesystem corruption is frequently silent, and every-
time it happens customers don't get on the phone and send the disks to apple
so that they can root cause the problem. Its quite possible this bug has
happened an untold number of times before it happened to someone who went
through the effort to reproduce and isolate it.

~~~
freehunter
>How do you know that?

Can you point to any other APFS issues that were reported before this one?

~~~
jandrese
My wife's laptop suddenly decided that the boot drive was corrupt a few weeks
after she updated to APFS. None of the recovery tools were of any use. We had
to reinstall the OS and pull the files from a backup. This story did not make
it to Hacker News.

~~~
zbentley
Anecdata != filesystem issue.

If you have more information about the problem you encountered and how it
implicates/interacts with APFS, please do link to it. Otherwise, bug reports
via circumstantial evidence are, while not inherently false, certainly
suspect.

~~~
hueving
You missed the point. The anecdote was to illustrate how even power users
might be working around filesystem bugs so a lack of bug reports specifically
mentioning APFS is certainly not proof that there aren't problems.

~~~
viraptor
It's also quite hard to report fs issues. I ended up one day with a not
working apfs system. Boot was ok, but I couldn't mount the user partition.
Apfs repair tool just failed and made the system hang. After a number of
restarts, attempts at repair, and attempts to move the partion somewhere it
can be decrypted, everything started working. And I actually had enough
experience to try and debug/fix it - many people would end up wiping the
system, or having to go to Apple shop.

This is not reportable. I got only a generic error or hanging system. I can't
reproduce it. I don't know why it started and why it finished. Yet, it was
almost certainly an apfs issue.

Even if I wanted to play, my priority was to get the work laptop usable again.

------
kbumsik
Two bugs are described in this article:

1\. An APFS volume's free space doesn't reflect a smaller amount of free space
on the underlying disk

2\. The diskimages-helper application doesn't report errors when write
requests fail to grow the disk image

These are not even complex problems of the new format. It is just Apple forgot
to have basic checks. It is like the root access with an empty password
incident happened 2 months ago. Why these serious but basic problems happen?
What is going on with Apple?

~~~
huslage
(1) is incorrect. Sparsebundles are capable of over-provisioning the storage
below them _by design_. They always have been. They are, I believe, APFS
snapshots essentially now. This behavior is consistent with most other
filesystems with similar constructs.

(2) is the real issue here.

~~~
dreamcompiler
> Sparsebundles are capable of over-provisioning the storage below them _by
> design_.

According to TFA, HFS+ sparsebundles reflect the limitations of their
underlying volume, while APFS sparsebundles do not. Seems clear to me that
this is a bug.

------
saagarjha
Sure, it's a new filesystem; it's bound to have bugs. However, one pain point
I've identified is that the existing tools often have _no idea_ of how to deal
with APFS. I'm currently typing this from a Mac with an APFS drive that is
almost certainly experiencing filesystem corruption–I have folders that
suddenly lose track of their contents and become _impossible to delete_ ;
however, existing tools such as fsck, diskutil, etc. can't do anything to fix
the issue, because their idea of how APFS works is woefully inadequate.

~~~
jaaames
Yep, I had an APFS volume that I couldn't even health check. I was pretty sure
it was filesystem corruption as it'd hang and shut off randomly then come back
up without issues, same as when the SSD died the first time. Also am r APFS
time machine volume would never finish encrypting, even after being plugged in
for days.

~~~
madeofpalk
Why would you assume that's file system corruption?

------
praseodym
APFS not having block or file-level checksums really seems like a big
oversight to me. While the filesystem’s designers considered the hardware-
level guarantees to be sufficient [1], this issue shows that there is an
entire class of problems that they have not considered. Disk images and
loopback-mounted filesystems or even disk-level cloning introduce additional
layers of complexity where a filesystem can be silently corrupted, even when
the actual physical storage layer is perfectly reliable.

A filesystem should be able to last for decades (HFS was designed thirty years
ago); I regard not having checksums in a brand new filesystem an over-
optimistic tradeoff.

[1] [http://dtrace.org/blogs/ahl/2016/06/19/apfs-
part5/](http://dtrace.org/blogs/ahl/2016/06/19/apfs-part5/)

~~~
jchb
Apple HFS+ didn't have file data checksums either. The default filesystem on
most Linux distributions, ext4 doesn't, it just stores checksums of the file
metadata, not the file data. Same story with Windows NTFS filesystem.
Microsofts newer ReFS filesystem has file data checksums disabled by default.
So it seems like a tradeoff that most of the major operating system are
making. Most likely related to performance.

Edit: macOS disk images do have a checksum of the whole image data though. The
issue mentioned in the article seems to be caused by an oversight in the disk
image helper app, rather than in the APFS filesystem itself.

~~~
temprature
Performance is only an issue if your disk can write faster than your CPU can
hash. hammer2 changed its hash a couple of years ago because this started
happening with newer NVMe drives[0], but before that disk writes weren't CPU-
bound.

[0]
[http://lists.dragonflybsd.org/pipermail/commits/2016-June/50...](http://lists.dragonflybsd.org/pipermail/commits/2016-June/500610.html)

~~~
jchb
What about: * Disk reads may unnecessarily trash the CPU caches, because CPU
will need to verify the checksum when the DMA read is done, even if the app
isn't going to process the data immediately afterwards * Battery life -
without checksums the CPU can stay mostly idle, and go into a lower power
mode, while the disk controller does its job

------
derefr
I tend to create sparsebundles to clone my git repos within, to get around the
overhead of having huge numbers of inodes on a volume. (Copying, deleting,
unpacking archives, Spotlight indexing—all are way slower when you have the
worktrees and .git directories from a thousand large repos splayed out across
your disk.) So I was a little worried here.

Thankfully, I had manually been setting my sparsebundles back to HFS+ on
creation, because I saw no reason to make them APFS containers.

~~~
eropple
This is a neat idea, I've never thought about doing that. You should write a
blog post about it.

~~~
blattimwind
TBH an I/O system where having a file system inside a loopback device on
another file system is faster than using said file system directly in the
first place sounds kinda broken-ish/poorly scaling to me.

~~~
eropple
Having Spotlight ignore .git directories and the like is probably wise, I
would agree with that. But it's text, even if it's basically garbage text
(from a user perspective). So I can understand how a sparsebundle is a decent
end-around.

The Finder in general ends up basically being useless for me for similar
reasons; I have dozens of random dependency files I don't even recognize pop
up in "All My Files".

~~~
dunham
Spotlight ignores hidden directories (e.g. .git) and directories whose names
end in .noindex. You can create a file in one with a unique name and try to
mdfind it to verify this.

~~~
eropple
Does it ignore them, or does it traverse them and throw them out?

Anecdotally it certainly seems like indexing is slower on my dev drive than
anywhere else, so I'm curious.

~~~
dunham
I'm sure it gets the events. It probably has to walk back up the tree to
determine if the file is hidden. Dunno how much work it does. I presume it
doesn't actually do the metadata extraction from the files. (But my
presumption is based on "surely they wouldn't do that".)

The biggest offender for me when I touch a lot of files is Dropbox. It seems
to use a lot of CPU when, e.g., an Xcode update is being installed. I've read
that they had to listen to events for the whole volume because the more
specific APIs weren't giving them the data they needed, but you'd think they
could fast-path the files that were outside their sandbox.

Is your dev drive a platter drive or SSD? I've found that the last few major
releases of osx have big performance issues on systems with old-school hard
drives. (Frequent beach-balling, etc.)

------
wiradikusuma
I upgraded my hackintosh to High Sierra with APFS. The next day, I
accidentally switched off the machine while it was in the _process_ of
shutting down (the screen had gone blank, but casing still emitting lights).

Next time I turned it on, I couldn't get past login screen (giving me forever
beach ball).

I put the ssd inside my old MBP as slave to recover data.

The ssd was corrupted, most data gone, as in shown in Finder but couldn't be
copied.

I googled for solutions, but it seems I'm the first to experience this.

~~~
voldemort1968
...So not an actual Apple Product.

~~~
bhj
My (limited) understanding of APFS is that it forgoes some integrity checks on
the assumption that they have already been done by lower-level hardware. This
is of course a debatable design decision, but it may indeed be unwise to use
APFS on non-Apple hardware.

~~~
ysleepy
No, APFS must be usable on USB drives and so on. That would be a fatal design
flaw.

~~~
bhj
From [http://dtrace.org/blogs/ahl/2016/06/19/apfs-
part5/](http://dtrace.org/blogs/ahl/2016/06/19/apfs-part5/) :

"Explicitly not checksumming user data is a little more interesting. The APFS
engineers I talked to cited strong ECC protection within Apple storage
devices. Both flash SSDs and magnetic media HDDs use redundant data to detect
and correct errors. The engineers contend that Apple devices basically don’t
return bogus data. NAND uses extra data, e.g. 128 bytes per 4KB page, so that
errors can be corrected and detected. (For reference, ZFS uses a fixed size 32
byte checksum for blocks ranging from 512 bytes to megabytes. That’s small by
comparison, but bear in mind that the SSD’s ECC is required for the expected
analog variances within the media.) The devices have a bit error rate that’s
tiny enough to expect no errors over the device’s lifetime. In addition, there
are other sources of device errors where a file system’s redundant check could
be invaluable. SSDs have a multitude of components, and in volume consumer
products they rarely contain end-to-end ECC protection leaving the possibility
of data being corrupted in transit. Further, their complex firmware can (does)
contain bugs that can result in data loss."

(sorry for the edits, I finally found the paragraph my memory was referring
to)

~~~
ghusbands
But if they're so confident in the disk, then why do they checksum the
metadata? They should either trust the disk and have no checksums or not trust
the disk and checksum everything.

There are plenty of other reasons not to checksum user data, as it's a choice
many have made, but that they trust the disk is an invalid argument.

------
baxtr
I get the feeling that many of those who comment didn’t read the article. It
says

> Note: What I describe below applies to APFS sparse disk images only —
> ordinary APFS volumes (e.g. your SSD startup disk) are not affected by this
> problem. While the underlying problem here is very serious, this is not
> likely to be a widespread problem, and will be most applicable to a small
> subset of backups.

------
bonestamp2
A friend of mine lost a ton of data this week after mac os crashed and he
restarted it. The only things he had done since getting the computer a week
ago were:

1\. update to high sierra 2\. copy over files from old mac 3\. record about
40gb of screen share data using quicktime (which is what he was doing when it
crashed)

He spent hours on the phone with apple, the tech said he had never seen
anything like it and they weren't able to recover his data... but after
reading the other horror stories in this thread there seems to be some serious
problems with high sierra and/or APFS.

~~~
deergomoo
I had a kernel panic a few weeks ago that left the OS in a state where it was
unable to boot. It seemed to think it was mid-upgrade and was complaining
about missing the packages an upgrade would read from. Thankfully macOS has an
in-place OS reinstall option and as far as I’ve been able to see all my data
is totally fine. But it was bizarre and I’ve never experienced anything like
it in a decade of using Macs.

~~~
bonestamp2
Everything running fine now after the OS reinstall?

------
tlo
My AFPS volume got corrupted probably during the upgrade to High Sierra. No
data lost, but my disk is missing almost half the disk space. See
[https://apple.stackexchange.com/q/311843/26185](https://apple.stackexchange.com/q/311843/26185)

~~~
rangibaby
I “lost” a lot of space due to time machine local backups. It was frustrating
to research and I thought there was something seriously wrong with my
computer. Try deleting the local backups and see if you get some space back

~~~
T-N-T
I had to make room on my MBP SSD to install windows through bootcamp and
bashed my head against a wall on the same issue. It took me half an hour to
find the reason why so much of my hard drive wasn't available despite me
having deleted almost all third party apps and personal data on my macOS
partition. Time Machine does 'local backups' and there is NOWHERE in the user
interface that fully explains the space they occupy and how to get rid of it.
To delete those local backups you need to use the terminal program tmutil.

That gave me even more vindication for my move. Also, you really don't know
how fast your hardware is until you've used something other than macOS on it.
From booting the system to launching software.. everything is snappier now.

~~~
rangibaby
Yes it’s a really poor design decision. They could have at least added “local
time machine backups” color to Disk Utility, or added a way to turn them off.

Agreed re: snappiness of other OS. Ubuntu flies on my 2013 MBP.

~~~
wila
There used to be a way to turn in off before High Sierra, now the only thing
you can do is delete the local snapshots. [0]

[0] [https://forums.macrumors.com/threads/solution-reclaim-
storag...](https://forums.macrumors.com/threads/solution-reclaim-storage-back-
from-system.2073174/)

------
gigatexal
Should have paid Oracle and just put ZFS on OS X.

~~~
alwillis
Apple killed ZFS on macOS nearly 10 years ago. It wouldn’t make sense on a
watch or a phone anyway: [http://www.zdnet.com/article/mac-zfs-is-dead-
rip/](http://www.zdnet.com/article/mac-zfs-is-dead-rip/)

~~~
gigatexal
200B in the bank and they couldn’t find a way to port it to other platforms? I
think they could.

~~~
alwillis
It’s not about not being able to port it; it’s not the right tool for the job.
ZFS is a file system designed for servers; the Apple Watch, Apple TV and the
iPhone and iPad aren’t servers.

~~~
STRML
That doesn't mean it can't be adapted and tuned for mobile. The average
smartwatch has more processing power than the average server did at the time
the ZFS project was launched (2001).

~~~
gigatexal
+1 to that. Or at least some subset that speaks the ZFS-protocol if such a
thing exists.

------
notadoc
High Sierra seems like a real gem.

~~~
FreakyT
Honestly, it's the first Mac OS release I'd actively recommend avoiding
upgrading to. It offers essentially no benefits over the prior release, and a
whole lot of downsides. (Security issues notwithstanding, High Sierra also
drops compatibility for a lot of older software.)

~~~
protomyth
Lion actively screwed up normal people's workflows with the botched "Save As"
replacement. I still want an explanation of why they thought that was a good
idea. We skipped that one after one of the admin assistants discovered the new
joy.

The whole California-series of OS has had a broken Finder. I see some fixes in
High Sierra, but its still buggy as heck for large file moves and broken
scripting. I'm hoping they take a long hard look at Mac OS like they seem to
be with the next iOS. I can forgive removing some UNIX commands, but the
general bugs and unexplained crashes are starting to get on my nerves.

~~~
dreamcompiler
I know I sound like an old crank, but I've been a Mac user since 1984. And the
last MacOS version that I completely trusted as being loyal to my needs and
workflow was Snow Leopard. Every later version has felt like it was really
Apple's OS and I was just borrowing it.

~~~
martinald
Totally agree. I used to be excited about upgrading MacOS; now I absolutely
dread it. I only upgrade when Xcode etc doesn't work on old OS anymore.

Until 10.13.3 I could barely use my MBP; horrendous graphical corruption
issues. How this can happen I have no idea.

------
hitekker
Yet another reason to stick with El Capitan.

~~~
Apocryphon
I really want to- but I'm locked out of Xcode 9 if I linger. If only I can run
it through some sort of virtualization

~~~
wila
VMware Fusion supports High Sierra as guest including APFS. I was under the
impression that Parallels does the same. Not sure about Virtual Box though.

------
tqkxzugoaupvwqr
The same behavior (the sparse bundle disk image not updating the free space
amount in accordance with the underlying disk’s free space) is present if one
selects ExFAT as format in Disk Utility.

I haven’t tested if file corruption is the consequence, too, of copying more
data into the disk image than the underlying disk has free space.

------
testplzignore
> To prevent errors when a filesystem inside of a sparse image has more free
> space than the volume holding the sparse image, HFS+ volumes inside sparse
> images will report an amount of free space slightly less than the amount of
> free space on the volume on which image resides.

Can anyone explain the "slightly less than" part of this? Why wouldn't it just
be "equal to"?

~~~
verisimilitude
The sentences following that statement in the hdiutil manual are also helpful:

"The image filesystem currently only behaves this way as a result of a direct
attach action and will not behave this way if, for example, the filesystem is
unmounted and remounted. Moving the image file to a different volume with
sufficient free space will allow the image's filesystem to grow to its full
size."

hdiutil has some of the best man pages I've ever run across.

------
rlkf
Is the incident with the matching checksum that he mentions because APFS only
checksum metadata (which are on preallocated space on the image), or is it his
own checksum (say sha1sum), I wonder? It seems strange that the filesystem
driver would cache 500 GB of sequentially written data in RAM.

~~~
zbentley
> It seems strange that the filesystem driver would cache 500 GB of
> sequentially written data in RAM.

That was the most interesting/worrying part of TFA, and I would love to see
how the checksum tests were conducted clarified in the text.

Presumably, the "md5" commandline tool has no special fallback to the
filesystem checksum cache (if it does, rather a lot of my life has been a lie,
I'm afraid). Since that's the case, could we assume that, if the "lost" writes
totalled $X GB of data, that any evil memory-caching of the file will only
work in the presence of at least $X GB of free system memory (RAM plus swap).

I'd also be interested in learning what happens if there's less than that
amount of memory available. Will the checksum fail? Will an error occur
elsewhere? Will the system have some sort of memory (and swap) exhaustion
failure/panic?

~~~
praseodym
The video embedded in TFA shows md5 reporting identical checksums before
unmounting the disk image, so it must be reading the data from a cache.

------
agildehaus
Can you even test a filesystem properly internally?

Seems to me we're all involved in a massive public beta.

~~~
tankenmate
Apple authored a tool called fsx (file system exerciser), but I doubt it
checks for corner cases for loopback mounts. Maybe they should add such tests.

~~~
koverstreet
That tool has been around for ages, I think it might originally be from SGI.

------
woodfall
so why exactly does somebody backup into a image with variable size, when they
could just dd together a fixed size image that makes these free space
calculations unnecessary?

~~~
tqkxzugoaupvwqr
I use a sparse bundle disk image, i.e. a disk image with variable size and
consisting of multiple files under the hood, because it is more efficient to
back up over a network. Instead of uploading a 50 GB file to a cloud storage
on every backup, only a fraction of data has to be uploaded (the sparse
bundle’s changed files) which makes backing up the 50 GB file very fast if
only a few megabytes were added.

If anyone is curious: I use restic[1] as backup client and Backblaze B2[2] as
backup storage. Works well with sparse bundles.

[1] [https://restic.net](https://restic.net)

[2] [https://www.backblaze.com/b2/cloud-
storage.html](https://www.backblaze.com/b2/cloud-storage.html)

~~~
woodfall
Wait, why would the 50 GB need to be transferred fully every time if it was a
fixed size image?

~~~
tqkxzugoaupvwqr
The very first backup of a fixed size image will be the full size of the
image, e.g. 50 GB, no matter which backup software you use. Even if inside the
image no files exist. The very first backup of a sparse bundle disk image will
be ~100 MB (the initial size of that image).

On repeated backups, some backup softwares operate on file level and upload
the whole file if it changed. So if you have a fixed size 50 GB image, mount
it, add a file, unmount it, it changed, and the whole 50 GB image file has to
be uploaded (with some backup softwares).

~~~
Nullabillity
Sure, but any halfway competent backup program would split it into chunks
before deduplication anyway, rendering the exercise pointless.

------
voldemort1968
Not to downplay the importance of this, but it reads as clickbait that you
wait until the second paragraph to say "oh yeah, it's only sparsebundles. Just
those things that almost nobody uses."

~~~
yborg
Yes, nobody - like for example Apple, to implement Time Machine.

But now perhaps I better understand why Time Machine backups aren't supported
on APFS.

------
blumomo
Click bait.

 _What I describe below applies to APFS sparse disk images only — ordinary
APFS volumes (e.g. your SSD startup disk) are not affected by this problem.
While the underlying problem here is very serious, this is not likely to be a
widespread problem, and will be most applicable to a small subset of backups.
Disk images are not used for most backup task activity, they are generally
only applicable when making backups to network volumes. If you make backups to
network volumes, read on to learn more._

~~~
dictum
The title clearly qualifies the claim: "MacOS may lose data __on APFS-
formatted disk images__ "

I didn't know there were APFS-formatted disk images (new in 10.13). Even when
you consider the many different kinds of disk images that macOS supports,
there's a pretty clear distinction between _disk image_ and _a backup of your
startup disk, made to another partition in another drive_.

Any additional clarification would get into "MacOS may lose data on APFS-
formatted disk images (disk images, not disk-to-disk, as in another volume..."
territory.

~~~
condescendence
Yeah this is the opposite of a click-bait title; it clearly explains what the
article is about.

"may" lose data on "APFS-formatted" disk images.

