A disk so full, it couldn't be restored

miles · 2024-04-04T02:32:42 1712197962

The author might have had better luck by using an external storage device to boot the Mac and delete unneeded files on the internal disk from there:

Use an external storage device as a Mac startup disk https://support.apple.com/en-us/111336

Was surprised to learn that with Apple silicon-based Macs, not all ports are equal when it comes to external booting:

If you're using a Mac computer with Apple silicon, your Mac has one or more USB or Thunderbolt ports that have a type USB-C connector. While you're installing macOS on your storage device, it matters which of these ports you use. After installation is complete, you can connect your storage device to any of them.

* Mac laptop computer: Use any USB-C port except the leftmost USB-C port when facing the ports on the left side of the Mac.

* iMac: Use any USB-C port except the rightmost USB-C port when facing the back of the Mac.

* Mac mini: Use any USB-C port except the leftmost USB-C port when facing the back of the Mac.

* Mac Studio: Use any USB-C port except the rightmost USB-C port when facing the back of the Mac.

* Mac Pro with desktop enclosure: Use any USB-C port except the one on the top of the Mac that is farthest from the power button.

* Mac Pro with rack enclosure: Use any USB-C port except the one on the front of the Mac that's closest to the power button.

mrb · 2024-04-04T11:28:39 1712230119

The author tried essentially the same thing as what you suggest. He booted into recoveryOS (a separate partition) then from there tried to delete files from the main system partition. But rm failed with the same error "No space left on device". So as others have suggested, truncating a file might have worked "echo -n >file"

davorak · 2024-04-04T14:28:40 1712240920

The next step I have used and seen recommended after recoveryOS is single user mode, which is what I think I used to solve the same issue on an old mac. I vaguely remember another reason I used single user mode where recovery mode failed but I do not remember any details.

My bet is that you can get nearly the same functionality with single user mode vs booting from external media, but I only have a vague understanding of the limitations of all three modes from 3-5 uses via tutorials.

userbinator · 2024-04-04T02:43:16 1712198596

If the filesystem itself got into a deadlocked state, booting from anything and going through the FS driver to delete files from it won't work.

HumanOstrich · 2024-04-04T02:55:39 1712199339

What do you mean by "deadlocked state" for a filesystem?

syncsynchalt · 2024-04-04T03:19:29 1712200769

Modern (well, post-ZFS) filesystems operate by moving the filesystem through state changes where data is not (immediately) destroyed, but older versions of the data are still available for various purposes. Similar to an ACID-compliant database, something like a backup or recovery process can still access older snapshots of the filesystem, for various values of "older" that might range from milliseconds to seconds to years.

With that in mind, you can see how we get in a scenario where deleting a file will require a minor bit of storage for recordkeeping the old and new states, before it can actually free up the storage by releasing the old state. There is supposed to be an escape hatch for getting yourself out of a situation where there isn't even enough storage for this little bit of record keeping, but either the author didn't know whatever trick is needed or the filesystem code wasn't well-behaved in this area (it's a corner-case that isn't often tested).

jrockway · 2024-04-04T05:32:33 1712208753

I'm most surprised by the lack of testing. Macs tend to ship with much smaller SSDs than other computers because that's how Apple makes money ($600 for 1.5TB of flash vs. $100/2TB if you buy an NVMe SSD), so I'd expect that people run out of space pretty frequently.

callalex · 2024-04-04T17:45:14 1712252714

And if you make the experience broken and frustrating people will throw the whole computer away and buy a new one since the storage can’t be upgraded.

kjkjadksj · 2024-04-04T21:29:26 1712266166

Not to mention potentially paying for your cloud storage for life.

brokenmachine · 2024-04-04T22:41:39 1712270499

Don't forget to make it so nothing they have works properly with any other brands devices so the next one they buy must also be Apple.

Rinse and repeat.

dataflow · 2024-04-04T05:32:23 1712208743

It feels like insanity that the default configuration of any filesystem intended for laymen can fail to delete a file due to anything other than an I/O error. If you want to keep a snapshot, at least bypass it when disk space runs out? How many customers do the vendors think would prefer the alternative?!

kamray23 · 2024-04-04T08:03:38 1712217818

It's not really just keeping snapshots that is the issue, usually. It's just normal FS operation, meant to prevent data corruption if any of these actions is interrupted, as well as various space-saving measures. Some FSs link files together when saving mass data so that identical blocks between them are only stored once, which means any of those files can only be fully deleted when all of them are. Some FSs log actions onto disk before and after doing them so that they can be restarted if interrupted. Some FSs do genuinely keep files on disk if they're already referenced in a snapshot even if you delete them – this is one instance where a modal about the issue should probably pop up if disk space is low. And some OSes really really really want to move things to .Trash1000 or something else stupid instead of deleting them.

p_l · 2024-04-04T08:22:02 1712218922

Pretty much by the time you get to 100% full on ZFS, the latency is going to get atrocious anyway, but from my understanding there are multiple steps (from simplest to worst case) that ZFS permits in case you do hit the error:

1. Just remove some files - ZFS will attempt to do the right thing

2. Remove old snapshots

3. Mount the drive from another system (so nothing tries writing to it), then remove some files, reboot back to normal

4. Use `zfs send` to copy the data you want to keep to another bigger drive temporarily, then either prune the data or if you already filtered out any old snapshots, zero the original pool and reload it by `zfs send` from before.

Shorel · 2024-04-04T10:56:55 1712228215

Modern defrag seems very cumbersome xD

p_l · 2024-04-04T12:34:09 1712234049

Defragmentation and ability to do it are not free.

You can have cheap defrag but comparatively brittle filesystems by making things modifiable in place.

You can have filesystem that has as its primary value "never lose your data", but in exchange defragmentation is expensive.

dataflow · 2024-04-04T22:16:25 1712268985

I don't buy this? What does defragmentation have to do with snapshotting? Defragmentation is just a rearrangement of the underlying blocks. Wouldn't snapshots just get moved around?

p_l · 2024-04-04T23:06:43 1712272003

The problem is that you have to track down all pointers pointing to specific block.

With snapshotting, especially with filesystems that can only write data through snapshots (like ZFS), blocks can be referred to by many pointers.

It's similar to evaluating liveness of object in a GC, except you're now operating on possibly gigantic heap with very... pointer-ful objects, that you have to rewrite - which goes against core principle of ZFS which is data safety. You're doing essentially a huge history rewrite on something like git repo, with billions of small objects, and doing it safely means you have to rewrite every metadata block that in any way refers to given data block - and rewrite every metadata block pointing to those metadata blocks.

dataflow · 2024-04-04T23:22:24 1712272944

But more pointers is just more cost, not outright inability to do it. The debate wasn't over whether defragmentation itself is costly. The question was whether merely making defragmentation possible would impose a cost on the rest of the system. So far you've only explained why defragmentation on a snapshotting volume would be expensive with typical schemes, which is entirely uncontroversial. But you neither explain why you believe defragmentation would be impossible (no "ability to do it") with your scheme, nor why you believe it's impossible for other schemes to make it possible "for free"?

In fact, the main difficulty with garbage collectors is maintaining real-time performance. Throw that constraint out, and the game changes entirely.

p_l · 2024-04-05T00:38:54 1712277534

I never claimed it's impossible - I claimed it's expensive. Prohibitively expensive, as the team at Sun found out when they attempted to do so, and offline defrag becomes easy with two-space approach which is essentially "zfs send to separate device".

You can attempt to add an extra indirection layer, but it does not really reduce fragmentation, it just lets you remap existing blocks to another location at a cost of extra lookup. This is in fact implemented in ZFS as solution for erroneous addition of a vdev, allowing device removal though due to performance cost its oriented mostly at "oops, I added the device wrongly, let me quickly revert".

dataflow · 2024-04-05T02:26:38 1712283998

If by "not able to" you meant "prohibitively expensive" - well, I also don't see why it's prohibitively expensive even without indirection. Moving blocks would seem to be a matter of (a) copy the data, (b) back up the old pointers, (c) update the pointers in-place, (d) mark the block move as committed, (e) and delete the old data/backups. If you crash in the middle you have the backup metadata journaled there to restore from. No indirection. What am I missing? I feel like you might have unstated assumptions somewhere?

p_l · 2024-04-05T08:54:13 1712307253

My bad - I'm a bit too into the topic and sometimes forget what other people might not know ^^;

You're missing the part where (c) is forbidden by design of the filesystem, because ZFS is not just "Copy on Write" by default (like BTRFS, which has in-place rewrite option, IIRC) nor LVM/disk-mapper snapshot which similarly don't have strong invariants on CoW.

ZFS writes data to disk in two ways - a (logically) write-ahead log called ZFS Intent Log (which handles synchronous writes and is read only on pool import), and transaction group sync (txgsync), where all newly written data is linked into new metadata tree, sharing structure with previous TXG metadata tree (so unchanged branches are shared), and the pointer to the head of the tree is committed into on-disk circular buffer of at least 128 pointers.

Every snapshot in ZFS is essentially a pointer to such metadata tree - all writes in ZFS are done by creating a new snapshot. The named snapshots are just rooted in different places in filesystem. This means that sometimes even in case of catastrophic software bug (for example, master branch had for few commits a bug where they accidentally changed on-disk layout of some structures - one person ran master branch and hit that resulting in pool that could not be imported... but the design meant they could tell ZFS import to "rewind" to TXG sync number from before the bug)

Updating the blocks in place violates design invariants - once you violate them, the data safety guarantees are no longer guarantees. And this makes it into minimally offline operation, and at that point the type of client that needs in-place defragmentation can reasonably do the two-space trick (if you're big enough, to make that infeasible, you're probably big enough to easily throw in an extra JBOD at least and relieve fragmentation pressure).

To make latter paragraphs understandable (beware, ZFS internals as I remember them):

ZFS is constructed of multiple layers[1] - from the bottom (somewhat simplified):

1. SPA (Storage Pool Allocator) - what implements "vdevs" - the only layer that actually deals with blocks. It implements access to block devices, mirroring, RAIDz, draid, etc. and exposes single block-oriented interface upwards

2. DMU (Data Management Unit) - An object oriented storage system. Turns bunch of blocks into object-oriented PUT/GET/PATCH/DELETE like setup, with 128bit object IDs. Also handles base metadata - the immutable/write-once trees for turning "here's a 1GB blob of data" into 512b to 1MB portions on disk. For every given metadata tree/snapshot, there is no in-place changes - modifying an object "in place" means that new txgsync has, for given object ID, a new tree of blocks that shares as much structure with previous one as possible.

3. DSL / ZIL / ZAP - provide basic structures on top of the DMU - DSL is what gives you "naming" ability for datasets and snapshots, ZIL handles the write-ahead log for dsync/fsync, ZAP provides a key-value store in DMU objects.

4. ZPL / ZVOL / Lustre / etc - Those are the parts that implement user-visible filesystem. ZPL is ZFS Posix Layer, which is a POSIX-compatible filesystem implemented over object storage. ZVOL does similar but presents emulated block device. Lustre-on-ZFS similarly talks directly to ZFS object layer instead of implementing ODT/OST on top of POSIX files again.

You could, in theory, add an extra indirection layer just for defragmentation, but this in turn makes problematic layering violation (something found at Sun when they tried to implement BPR) - because suddenly SPA layer (the layer that actually handles block-level addressing) needs to understand DMU's internals (or a layer between the two needing bi-directional knowledge). This makes for possibly brittle code, so again - possible but against overarching goals of the project.

The "vdev removal indirection" works because it doesn't really care about location - it allocates space from other vdevs and just ensures that all SPA addresses that have ID of the removed vdev, point to data allocated on other vdevs. It doesn't need to know how the SPA addresses are used by DMU objects

dataflow · 2024-04-05T20:48:35 1712350115

I appreciate the long explanation of ZFS, but I don't feel most of it really matters for the discussion here:

> Updating the blocks in place violates design invariants - once you violate them, the data safety guarantees are no longer guarantees.

Again - you can copy blocks prior to deleting anything, and commit them atomically, without losing safety. The fact that you (or ZFS) don't wish to do that doesn't mean it's somehow impossible.

> the type of client that needs in-place defragmentation can reasonably do the two-space trick (if you're big enough, to make that infeasible, you're probably big enough to easily throw in an extra JBOD at least and relieve fragmentation pressure).

You're moving goalposts drastically here. It's quite a leap to go from "has a bit of free space on each drive" to "can throw in more disks at whim", and the discussion wasn't about "only for these types of clients".

And, in any case, this is all pretty irrelevant to whether ZFS could support defragmentation.

> this makes it into minimally offline operation

See, that's your underlying assumption that you never stated. You want defragmentation to happen fully online, while the volume is still in use. What you're really trying to argue is "fully online defragmentation is prohibitive for ZFS", but you instead made the wide-sweeping claim that "defragmentation is prohibitive for snapshotted filesystems in general".

p_l · 2024-04-06T17:53:54 1712426034

You're hung on the word "impossible" which I never used.

I did say that there are trade offs and that some goals can make things like defragmentation expensive.

ZFS' main design was that it nothing short of (extensive) physical damage should allow destruction of users data. Everything else was secondary. As such, the project was not interested, ever, in supporting in-place updates.

You can design a system with other goals, or ones that are more flexible. But I'd argue that's why BTRFS got undying reputation for data loss - they were more flexible, and that unfortunately also opened way for more data loss bugs.

dataflow · 2024-04-06T20:32:19 1712435539

> You're hung on the word "impossible" which I never used.

That's not true. That was only in the beginning -- "impossible" was only what I originally took (and would still take, but I digress) your initial comment of "ability to defragment is not free" to be saying. It's literally saying that if you don't pay a cost (presumably, performance or reliability), then you become unable to defragment. That sounded like impossibility, hence the initial discussion.

Later you said you actually meant it'd be "prohibitively expensive". Which is fine, but then I argued against that too. So now I'm arguing against 2 things: impossibility and prohibitive-expensiveness, neither of which I'm hung up on.

> ZFS' main design was that it nothing short of (extensive) physical damage should allow destruction of users data. Everything else was secondary.

Tongue only halfway in cheek, but why do you keep referring to ZFS like it's GodFS? The discussion was about "filesystems" but you keep moving the goalposts to "ZFS". Somehow it appears you feel that if ZFS couldn't achieve something then nothing else possibly could?

Analogy: imagine if you'd claimed "button interfaces are prohibitively expensive for electric cars", I had objected to that assertion, and then you kept presenting "but Tesla switched to touchscreens because they turned out cheaper!" as evidence. That's how this conversation feels. Just because Tesla/ZFS has issues with something that doesn't mean it's somehow inherently prohibitive.

> As such, the project was not interested, ever, in supporting in-place updates.

Again: are we talking online-only, or are you allowing offline defrag? You keep avoiding making your assumptions explicit.

If you mean offline: it's completely irrelevant what the project is interested in doing. By analogy, Microsoft was not interested, ever, in allowing NTFS partitions to be moved or split or merged either, yet third-party vendors have supported those operations just fine. And on the same filesystem too, not merely a similar one!

If you mean online: you'd probably be some intrinsic trade-off eventually, but I'm skeptical it's at this particular juncture. Just because ZFS may have made something infeasible with its current implementation, that doesn't mean another implementation couldn't have... done an even better job? e.g., even with the current on-disk structure of ZFS (let alone a better one), even if a defragmentation-supporting implementation might not achieve 100% throughput while a defragmentation is ongoing, surely it could at least get some throughput during a defrag so that it doesn't need to go entirely offline? That would be a strict improvement over the current situation.

> But I'd argue that's why BTRFS got undying reputation for data loss - they were more flexible, and that unfortunately also opened way for more data loss bugs.

Hang on... a bug in the implementation is a whole different beast. We were discussing design features. Implementation bugs are... not in that picture. I'm pretty sure most people reading your earlier comments would get the impression that by "brittleness" you were referring to accidents like I/O failures & user error, not bugs in the implementation!

Finally... you might enjoy [1]. ;)

[1] https://www.reddit.com/r/zfs/comments/1826lgs/psa_its_not_bl...

werid · 2024-04-04T08:38:07 1712219887

i've filled up an zfs array to the point where i could not delete files.

the trick is to truncate a large enough files, or enough small files, to zero.

not sure if this is a universal shell trick, but worked on those i tried: "> filename"

pdimitar · 2024-04-04T10:04:08 1712225048

For reasons I am completely unwilling to research, just doing `> filename` has not worked for me in a while.

Since then I memorized this: `cat /dev/null >! filename`, and it has worked on systems with zsh and bash.

adrianmonk · 2024-04-04T14:39:55 1712241595

That seems to be zsh-specific syntax that is like ">" except that overrides a CLOBBER setting[1].

However, it won't work in bash. It will create file named "!" with the same contents as "filename". It is equivalent to "cat /dev/null filename > !". (Bash lets you put the redirection almost anywhere, including between one argument and another.)

---

[1] See https://zsh.sourceforge.io/Doc/Release/Redirection.html

pdimitar · 2024-04-04T18:03:49 1712253829

Yikes, then I have remembered wrong about bash, thank you.

In that case I'll just always use `truncate -s0` then. Safest option to remember without having to carry around context about which shell is running the script, it seems.

alias_neo · 2024-04-04T10:57:19 1712228239

"truncate -s0 filename"

I believe "> filename" only works correctly if you're root (at least in my experience, if I remember correctly).

EDIT: To remove <> from filename placeholder which might be confusing, and to put commands in quotes.

pdimitar · 2024-04-04T10:59:44 1712228384

Oh yes, that one also worked everywhere I tried, thanks for reminding me.

alias_neo · 2024-04-04T11:06:22 1712228782

Pleasure.

It saved me just yesterday when I needed to truncate hundreds of gigabytes of Docker logs on a system that had been having some issues for a while but I didn't want to recreate containers.

"truncate -s 0 /var/lib/docker/containers/**/*-json.log"

Will truncate all of the json logs for all of the containers on the host to 0 bytes.

Of course the system should have had logging configured better (rotation, limits, remote log) in the first place, but it isn't my system.

EDIT: Missing double-star.*

matja · 2024-04-04T11:48:12 1712231292

Simple to verify with strace -f bash -c "> file":

    openat(AT_FDCWD, "file", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3

man 2 openat:

    O_TRUNC
        If the file already exists and is a regular file and the
        access mode allows writing (i.e., is O_RDWR or O_WRONLY) it
        will be truncated to length 0.
        ...

pdimitar · 2024-04-04T11:55:23 1712231723

Sure, but I just get an interactive prompt when I type `> file` and I honestly don't care to troubleshoot. ¯\_(ツ)_/¯

matja · 2024-04-05T14:05:28 1712325928

Probably you are using zsh and need:

    MULTIOS=1 > file

- zsh isn't POSIX compatible by default

pdimitar · 2024-04-05T14:25:07 1712327107

I see. But in this case it's best to just memorize `truncate -s0` which is shell-neutral.

matja · 2024-04-04T12:20:27 1712233227

Ok, we'll leave that a mystery then!

markandrewj · 2024-04-04T22:29:04 1712269744

Depending on the environment you can also use the truncate command. This will work if the file is open as well.

https://man7.org/linux/man-pages/man1/truncate.1.html

ralferoo · 2024-04-04T12:03:21 1712232201

It'd be better to do ": >filename"

: is a shell built-in for most shells that does nothing.

aidenn0 · 2024-04-04T03:16:27 1712200587

Some filesystems may require allocating metadata to delete a file. AFIK it's a non issue with traditional Berkeley-style systems, since metadata and data come from a separate pools. Notably ZFS has this problem.

em-bee · 2024-04-04T04:43:45 1712205825

btrfs has this problem too it seems. but there it is usually easy to add a usb stick to extend the filesystem and fix the problem.

i find it really frustrating though. why not just reserve some space?

steve_rambo · 2024-04-04T07:01:23 1712214083

btrfs does reserve some space for exactly this issue, although it might not always be enough.

https://btrfs.readthedocs.io/en/latest/btrfs-filesystem.html

> GlobalReserve is an artificial and internal emergency space. It is used e.g. when the filesystem is full. Its total size is dynamic based on the filesystem size, usually not larger than 512MiB, used may fluctuate.

aidenn0 · 2024-04-04T05:39:28 1712209168

Yeah, with ZFS some will make an unused dataset with a small reservation (say 1G) that you can then shrink to delete files if the disk is full.

rincebrain · 2024-04-04T08:07:49 1712218069

This hasn't been a problem you should be able to hit in ZFS in a long time.

It reserves a percent of your pool's total space precisely to avoid having 0 actual free space and only allows using space from that amount if the operation is a net gain on free space.

MarkSweep · 2024-04-04T16:29:33 1712248173

For more details about this slop space, see this comment:

https://github.com/openzfs/zfs/blob/99741bde59d1d1df0963009b...

p_l · 2024-04-04T08:23:04 1712218984

Yeah, a situation where you pool gets suspended due to no space and you can't delete files is considered a bug by OpenZFS.

rincebrain · 2024-04-04T15:53:11 1712245991

I mean, the pool should never have gotten suspended by that, even before OpenZFS was forked; just ENOSPC on rm.

aidenn0 · 2024-04-04T13:24:50 1712237090

Oh, that's good to know. I hit it in the past, but it was long enough ago that ZFS still had format versions.

rincebrain · 2024-04-04T15:51:54 1712245914

Yeah, the whole dance around slop space, if I remember my archaeology, went in shortly after the fork.

p_l · 2024-04-04T08:23:40 1712219020

The recommended solution is to apply a quota on top-level dataset, but that's mainly for preventing fragmentation or runaway writes.

bbarnett · 2024-04-04T12:27:31 1712233651

I think the solution is to not use a filesystem that is broken in this way.

p_l · 2024-04-04T12:31:25 1712233885

Note that ZFS explicitly has safeguards against total failure. No filesystem will work well with near full state when it comes to fragmentation.

bbarnett · 2024-04-04T13:17:26 1712236646

This is a whataboutism. Being unable to use the filesystem, due to space full, without arcane knowledge, is not the same as "not working well".

This is a brokwn implementation.

aidenn0 · 2024-04-04T13:26:57 1712237217

You're misunderstanding. See the sibling thread where p_l says that this problem has been resolved, and any further occurrence would be treated as a bug. Setting the quota is only done now to reduce fragmentation (ZFS's fragmentation avoidance requires sufficient free space to be effective).

bbarnett · 2024-04-04T13:41:12 1712238072

No, I'm not. They said the "recommended solution" for this issue is to use a quota.

They also said it was mainly used for other issues, such as fragmentation. In other words, this was stated as a fix for the file delete issue.

How does this invalidate my comment, that this was a broken implementation?

It doesn't matter if it will be fixed in the future, or was just fixed.

aidenn0 · 2024-04-04T16:14:35 1712247275

According to rincebrain, the "disk too full to delete files" was fixed "shortly after the fork" which means "shortly after 2012." My information was quite out of date.

bbarnett · 2024-04-04T16:17:43 1712247463

Well I'm glad they fixed a bug, which made the filesystem unusable. Good on them, and thank you for clarification.

pxx · 2024-04-04T03:15:26 1712200526

Where `rm`, or more technically unlink(2), fails due to ENOSPC, like in the article...

Hilariously this failure case doesn't seem to be listed in the docs. https://developer.apple.com/library/archive/documentation/Sy...

klausa · 2024-04-04T03:56:58 1712203018

Why do you think that would work; if using recoveryOS or starting the Mac Share Disk/Target Disk mode didn't?

appplication · 2024-04-04T05:07:06 1712207226

This is the kind of comment someone is going to be very happy to read in 8 years when they’re looking for answers for their (then) ancient Mac.

deeth_starr_v · 2024-04-04T04:47:48 1712206068

Anyone know why you can’t use the first usb-c port on a Mac laptop to make the bootable os?

Tsiklon · 2024-04-04T05:58:28 1712210308

The ports mentioned expose the serial interface that can be used to restore/revive the machine in DFU mode

https://support.apple.com/en-us/108900

That said, no idea why they can’t be used in this case

joecool1029 · 2024-04-04T06:05:34 1712210734

> That said, no idea why they can’t be used in this case

My intuitive guess here is how the ports are connected to the T2 security chip. One port is as you said a console port that allows access to perform commands to flash/recover/re-provision the T2 chip. Same as an OOB serial port on networking equip.

The rest of the ports the T2 chip has read/write access to devices connected to it. Since this is an OS drive, I'm guessing it needs to be encrypted and the T2 chip handles this function.

amelius · 2024-04-04T08:34:47 1712219687

That doesn't make it technically impossible to implement booting from that port.

PeterisP · 2024-04-04T13:18:11 1712236691

Sure, but it also doesn't make it necessary or useful to implement booting from that port - booting from a port IMHO is not a feature that Apple wants to offer to its target audience at all, so it's sufficient if some repair technician can do that according to a manual which says which port to use in which scenario.

p_l · 2024-04-04T09:25:07 1712222707

The firmware is based on iPhone boot process from my understanding, and simply does not have space in ROM to implement boot from external storage.

The rest of the code necessary to boot from external sources is located on main flash

amelius · 2024-04-04T09:46:37 1712223997

Yes, but the decision to use this firmware was made by Apple.

This is like saying my software did not work because it was based on an incompatible version of some library. Maybe so, but that is a bad excuse. Implementing systems is hard, and like the rest of us, Apple should not get away with bad excuses. And this is even more true because they control more of the stack.

pjerem · 2024-04-04T16:01:06 1712246466

OTOH, the current implementation works and is sufficient so Apple could easily decide that it’s not worth modifying firmware that already works to solve an inexistant issue.

amelius · 2024-04-04T18:51:48 1712256708

That's how you end up with a UX that works only for programmers (see e.g. Linux in the 90s).

pjerem · 2024-04-04T20:12:22 1712261542

We are talking about booting from USB. On a Mac M2. That’s literally the most power user feature of a MacBook.

The 3 users of this feature on this planet are already happy that it’s even possible at all. The only thing Apple could do is to document this clearly like adding a text in the boot drive selector.

nottorp · 2024-04-04T06:30:05 1712212205

> Mac laptop computer: Use any USB-C port except the leftmost USB-C port when facing the ports on the left side of the Mac.

Also on my mbpro at least the mentioned port is the one closest to the magsafe connector and may have funny electrical connections to it, perhaps.

plussed_reader · 2024-04-04T04:53:22 1712206402

What if it's through a USB c adapter to a usbA thumbstick?

Macha · 2024-04-04T12:06:48 1712232408

> Was surprised to learn that with Apple silicon-based Macs, not all ports are equal when it comes to external booting

iirc, not all ports were equal when it came to charging with the m1 macs, so this is actually not so surprising.

voidbert · 2024-04-04T12:22:44 1712233364

But charging through many ports requires extra circuitry to support more power on every port, while booting from multiple ports just requires the boot sequence firmware to talk to more than one USB controller (like PC motherboards do, for example)

timcederman · 2024-04-04T03:07:34 1712200054

Or boot it into Target Disk Mode using another machine.

olliej · 2024-04-04T06:15:31 1712211331

you can't boot the arm Macs into target disk mode, you can only boot to the recovery os and share the drive - it shows up as a network share iirc. I was super annoyed by this a few weeks ago because you can, for example, use spotlight to search for "target disk mode" and it will show up, and looks like it will take you to the reboot in target disk mode option, but once you're there it's just the standard "choose a boot drive" selector.

userbinator · 2024-04-04T02:27:31 1712197651

My best guess at what happened (based on a little knowledge of HFS+ disk structures, but not APFS) is that the journal file also filled up, and since deletion requires writing to it and possibly expanding it, you get into the unusual situation where deletion requires, at least temporarily, more space.

macOS continued to write files until there was just 41K free on the drive.

I've (accidentally) ran both NTFS and FAT32 to 0 bytes free, and it was always possible to delete something even in that situation.

Digging around in forums, I found that Sonoma has broken the SMB/Samba-based networking mount procedure for Time Machine restores, and no one had found a solution. This appears to still be the case in 14.4.

In my experience SMB became unreliable and just unacceptably buggy many years ago, starting around the 10.12-10.13 timeframe; and now it looks like Apple doesn't care about whether it works at all anymore.

I hate to think what people without decades of Mac experience do when confronted with systemic, cascading failures like this when I felt helpless despite what I thought I knew and all the answers I searched for and found on forums.

I don't have "decades of Mac experience", but the first thing I'd try is a fsck --- odd not to see that mentioned here.

If I were asked to recover from this situation, and couldn't just copy the necessary contents of the disk to another one before formatting it and then copying back, I'd get the APFS documentation (https://developer.apple.com/support/downloads/Apple-File-Sys...) and figure out what to edit (with dd and a hex editor) to get some free space.

arghwhat · 2024-04-04T08:01:25 1712217685

That's for a journalling filesystem. For CoW filesystems, the issue is that any change to the filesystem is done by making a new file tree containing your change, and then updating the root to point the new tree. Later, garbage collection finds files that are no longer part of an active tree and returns their storage to the pool.

Changes are usually batched to reduce the amount of tree changes to a manageable amount. A bonus of this design is that a filesystem snapshot is just another reference to a particular tree.

This requires space, but CoW filesystems also usually reserve an amount of emergency storage for this reason.

Lammy · 2024-04-04T05:15:43 1712207743

> Sonoma has broken the SMB/Samba-based networking

Apple dropped Samba in favor of their own implementation a long time ago after Samba adopted GPLv3:

https://lists.samba.org/archive/samba-announce/2007/000122.h...

https://www.engadget.com/2011-03-24-apple-to-drop-samba-netw...

dredmorbius · 2024-04-04T21:29:47 1712266187

Similarly, Apple has resolved the increasingly-aged stock Bash install, eschewing Bash's subsequent GPLv3 licencing, by making zsh the default login shell.

I'd like to see zsh also adopt GPLv3 to call Apple's bluff.

Lammy · 2024-04-05T20:54:26 1712350466

So would I for the lolz but they would probably just switch to fish or dash

https://github.com/fish-shell/fish-shell/blob/master/COPYING

https://git.kernel.org/pub/scm/utils/dash/dash.git/tree/COPY... (Default Ubuntu shell since 6.10)

dredmorbius · 2024-04-06T01:58:32 1712368712

dash isn't a fully-featured interactive shell, and would be a major step backward from bash. At least with zsh, Apple are moving forward on a feature basis. (And I say this as someone who's still stubbornly set on bash over that newfangled zsh nonsense ;-)

I'm less familiar with fish, but based on a very fuzzy awareness, it's at least fully-featured.

I do encounter dash on a few systems. It's the default shell on my OpenWRT networking kit, for example. I've installed bash where those systems have enough storage to accommodate it.

mkesper · 2024-04-05T05:11:54 1712293914

But that's apparently not working with the same features?

wazoox · 2024-04-04T11:28:32 1712230112

Ah, Apple. SMB has always performed from horribly slowly a few years back, to barely decent recently, but is still way slower than NFS or (oh the irony) Appleshare on the exact same hardware.

Tested a few years ago throughput to a big NAS connected in 10gigEo from a Hackintosh with BlackMagic Disk Speed Test :

* running Windows, SMB achieves 900MB/s

* running MacOS, SMB achieves 200MB/s

* running MacOS, NFS and AFP both achieve 1000MB/s

Anything related to professional work is a sad joke in MacOS, alas.

(People keep repeating that AFP is dead, however it still works fine as a client on my Mac Pro -- and performs so much better than SMB than it's almost comical).

itsTyrion · 2024-04-15T12:46:25 1713185185

Fun (until you run into it) fact: the same thing is possible with BTRFS and ZFS. If you manage fill it to the brim, you might have a problem. BTRFS tries to become read-only while there is still room for metadata so you can remount it in safe mode and delete something, but no safecguard is perfect.

> ran both NTFS and FAT32 to 0b and was able to delete something.

AFAIK those aren’t journaled, no?

begueradj · 2024-04-04T06:14:14 1712211254

So far, you're the only one who provided a technical explanation for this.

greenicon · 2024-04-04T06:12:42 1712211162

For a networked Time Machine restore you can reinstall MacOS without restoring first and then use the migration utility to restore from a remote Time Machine. That seems to use a different smb binary which works. Still, I find it infuriating that restoring, one of the most important things you do on a machine, is broken and was not caught by QA.

chrisjj · 2024-04-04T09:41:34 1712223694

> get into the unusual situation where deletion requires, at least temporarily, more space.

s/unusual/usual/ surely.

staticfloat · 2024-04-04T03:22:38 1712200958

I ran into an issue like this in my first ever job! I accidentally filled up a cluster with junk files and the sysadmin started sending me emails saying I needed to fix it ASAP but rm wouldn’t work. He taught me that file truncation usually works when deletion doesn’t, so you can usually do “cat /dev/null > foo” when “rm foo” doesn’t work.

mjevans · 2024-04-04T04:07:43 1712203663

In shell :>filepath often works...

However sometimes filesystems can't do that. For those cases, hopefully the filesystem supports: resize-grow, resize-shrink, and either additional temporary storage or is on top of an underlying system which can add/remove backing storage. You may also need to use custom commands to restore the filesystem's structure to one intended for a single block device (btrfs comes to mind here).

JohnMakin · 2024-04-04T06:29:49 1712212189

I was once in a situation years ago where a critical piece of infrastructure could brick itself irreparably with a deadlock unless it was always able to write to the file system, so I had a backup process just periodically send garbage directly to dev null and as far as I know that dirty hack is still running years later.

/dev/null is magical and worth reading into

pram · 2024-04-04T03:26:28 1712201188

You can actually just do >file

JdeBP · 2024-04-04T03:42:11 1712202131

Although note that several comments here report situations where truncation doesn't work either. 21st century filesystem formats are a lot more complex than UFS, and with things like snapshotting and journalling there are new ways for a filesystem to deadlock itself.

deltarholamda · 2024-04-04T16:20:47 1712247647

I accidentally filled a ZFS root SSD with a massive samba log file (samba log level set way high to debug a problem, and then forgot to reset it), and had to use truncate to get it back.

I knew that ZFS was better about this, but even so I still got that "oh... hell" sinking feeling when you really bork something.

dredmorbius · 2024-04-05T00:20:51 1712276451

Having recently experienced an over-capacity MacOS disk, "emptying" files in this manner simply did not work.

pdimitar · 2024-04-04T10:07:07 1712225227

To me what works is `cat /dev/null >! filename`.

jaimehrubiks · 2024-04-04T03:24:37 1712201077

Great to know

voidwtf · 2024-04-04T01:59:39 1712195979

It seems like Time Machine has been steadily declining. I'm not sure why there is no impetus to get it reliable and functioning well. Between sparse bundles becoming corrupt and having to start a new backup and failing functionality I haven't felt like Time Machine is worth setting up anymore. This is in stark contrast to the iOS/iPadOS backups which have worked every time.

ksec · 2024-04-04T02:22:39 1712197359

>It seems like Time Machine has been steadily declining.

Because they dont sell Time Capsule anymore. And they want you to backup everything to iCloud to grow their Services Revenue.

firecall · 2024-04-04T03:36:47 1712201807

Totally.

But iCloud isn’t a backup.

Its sync.

And it will happily sync corrupt files and does not provide any versioning.

The best TimeMachine is an SSD connected locally to the Mac.

The second best is an SSD running on a Mac setup with TomeMachine Server.

Then you are lucky if backups continue to work. And even luckier if you can sensibly restore anything via the hellscape that is the interstellar wormhole travel interface! ;-)

BackBlaze is very reliable at least! Not cheap with a house full of computers to backup though :-/

realusername · 2024-04-04T06:44:06 1712213046

> And it will happily sync corrupt files and does not provide any versioning.

It does not even provide a basic progress bar when used on the phone.

philistine · 2024-04-04T03:08:09 1712200089

Extrapolate it one more step: Apple is clearly working on an iCloud backup for Macs, cause then that's more services revenue. While they're doing that, why would they fix bugs in Time Machine. People can't surely be using this old thing while we're working on the spiffy new thing!

firecall · 2024-04-04T03:38:47 1712201927

The problem though is that it is likely to be complete environment backup and restore, like iOS.

But we shall see!

Hopefully they will provide the ability to backup and restore file versions!

ryukoposting · 2024-04-04T04:36:36 1712205396

As a non-Mac user, this sounds like a catastrophic and inexcusable bug the likes of which would inspire a dogpile of hatred against desktop operating systems with penguin mascots and/or headquarters in Washington.

ipv6ipv4 · 2024-04-04T15:53:02 1712245982

Time Machine works fine. Better than anything available on Windows. I’ve used it for more than a decade and a half, with multiple restores, across multiple machines.

My current desktop Mac environment is a direct descendant of my original Mac from 2004 thanks, largely, to Time Machine.

folbec · 2024-04-04T08:29:33 1712219373

That's the magic of genius marketing

wlesieutre · 2024-04-04T02:08:28 1712196508

I don’t know how many times I’ve had Time Machine decide it didn’t want to work anymore and I had to wipe the backups and start fresh to make it work again, but it’s a much larger number than it should be.

fmajid · 2024-04-04T07:00:15 1712214015

Mac OS quality control has been declining since they fired Scott Forstall, and it wasn’t amazing under his tenure to begin with.

aequitas · 2024-04-04T07:22:59 1712215379

I don't share this experience. I've been running Time Machine for years now on a Samba share for multiple Macs and if anything I've only seen an improvement. Years back I would regularly get a corrupted Time Machine sparse bundle that had to be recreated (or I would restore a previous ZFS snapshot and it would continue of off that), but it also ran over AFP back then I think, not SMB. Lately I've not have had any of these issues on any of the machines. I do have specific flags enabled in the smb.conf file that are recommended for Time Machine backups.

andruby · 2024-04-04T09:56:06 1712224566

Have you restored from a backup during that Time? I have a similar setup (sambsa share on a ZFS NAS) and was appalled at how long it took.

Both machines on wired gigabit ethernet, yet the restore took more than 24 hours. And that was for just a 1TB disk.

steve1977 · 2024-04-04T04:57:40 1712206660

The whole of desktop macOS has been steadily declining, so I‘m not surprised by this story in the slightest.

But hey, we get new emojis and moving desktop wallpapers…

thenickdude · 2024-04-04T04:43:47 1712205827

By contrast, ZFS has "slop space" to avoid this very problem (wedging the filesystem by running out of space during a large operation). By default it reserves 3.2% of your volume's space for this, up to 128GB.

So by adjusting the Linux kernel tunable "spa_slop_shift" to shrink the slop space, you can regain up to 128GB of bonus space to successfully complete your file deletion operations:

https://openzfs.github.io/openzfs-docs/Performance%20and%20T...

bell-cot · 2024-04-04T09:07:33 1712221653

Yes - and reserving a percentage of disk space (for this reason) was a routine feature of "real" filesystems decades before ZFS (or Linux) even existed.

It's kinda like how almost any 1980's MS-DOS shareware terminal program was really good at downloading files over a limited-bandwidth connection, but current versions of MS Windows are utter crap at that should-be-trivial task.

nobody9999 · 2024-04-04T05:46:22 1712209582

>By contrast, ZFS has "slop space" to avoid this very problem

As does ext4 (although they call the space "reserved blocks"). 'man tune2fs' for details. As well as most other modern (and not so modern[0]) filesystems.

[0] As I recall, the same was true for SunOS'[1] UFS back in the 1980s.

[1] https://en.wikipedia.org/wiki/SunOS

_flux · 2024-04-04T06:41:52 1712212912

In ext[234]fs the reserved blocks is something else though: they are reserved to a specific user, by default root. So if normal users fill out the filesystem, the root user still has some space to write. Sort of a simple quota system.

I believe this problem in is only relevant to CoW filesystems. With ext[234]fs you can set the reserved blocks to 0, fill the fs, and always remove files to fix the situation.

JdeBP · 2024-04-04T04:07:28 1712203648

People find it a confusing idea to grasp that deleting things actually requires more space, either temporarily or permanently. Other comments here have gone into the details of why some modern filesystems with snapshotting and journalling and so forth actually end up needing to allocate from free space in order to delete stuff.

In a different field: In the early decade of Wikipedia it often had to be explained to people that (at least from roughly 2004 onwards) deleting pages with the intention of saving space on the Wikipedia servers actually did the opposite, since deletion added records to the underlying database.

Related situations:

* In Rahul Dhesi's ZOO archive file format, deleting an archive entry just sets a flag on the entry's header record. ZOO also did VMS-like file versioning, where adding a new version of a file to an archive did not overwrite the old one.

* Back in the days of MS/DR/PC-DOS and FAT, with (sometimes) add-on undeletion utilities installed, deleting a file would need more space to store a new entry into the database that held the restore information for the undeletion utility.

* Back in the days of MS/DR/PC-DOS and FAT, some of the old disc compression utilities compressed metadata as well, leading to (rare but possible) situations where metadata changes could affect compressibility and actually increase the (from the outside point of view) volume size.

"I delete XYZ in order to free space." is a pervasive concept, but it isn't strictly a correct one.

jen729w · 2024-04-04T02:14:36 1712196876

I had this issue in October 2018 as documented in this Stack Overflow question, whose text I’ll paste below.

I was lucky: I had an additional APFS partition that I could remove, thus freeing up disk space. Took me a while to figure out, during which time I was in a proper panic.

---

https://apple.stackexchange.com/questions/338721/disk-full-t...

I’m in a pickle here. macOS Mojave, just updated the other day. I managed to fill my disk up while creating a .dmg, and the system froze. I rebooted. Kernel panic.

Boot to Recovery mode. Mount the disk. Open Terminal.

–bash–3.2# rm /path/to/large/file

rm: /path/to/large/file: No space left on device

Essentially the same issue as this Unix thread from ‘08! https://www.unix.com/linux/69889-unable-remove-file-using-rm...

I’ve tried echo x > /path/to/large/file, no good.

It’s borked. Does anyone have any suggestions that aren’t “wipe the drive and restore from your backup”?

bombcar · 2024-04-04T02:55:18 1712199318

Sounds like creating a sliver of an extra partition of a gig or so might be valuable insurance.

Kind of like the old Unix file systems that would reserve 5% for root.

pixelfarmer · 2024-04-04T08:31:30 1712219490

Since the SSD days (I started like 15 years ago with that) I keep a bit of space empty for two reasons: Emergency situations where you may need some extra space and to give SSDs a bit more room to relocate blocks to (they have a certain amount internally reserved already).

djmips · 2024-04-04T09:14:47 1712222087

I see in that stackexchange post, there's since been another alternative potential solution where you delete you virtual memory partition and if it's large enough it can give you back enough space to allow deleting of files to happen.

bouke · 2024-04-04T05:13:15 1712207595

With APFS this is not as straightforward though, as containers are only allocated when written too.

desro · 2024-04-04T03:16:12 1712200572

Impressive. I've never dealt with a situation where even `rm` failed, but I have had the displeasure of using and managing modern Macs with 256 GB (or less) of internal storage. I like to keep a "spaceholder" file of around 16GB so when things inevitably fill up and prevent an update or something else, I can nuke the placeholder without having to surgically prune things with `ncdu`

timenova · 2024-04-04T05:10:36 1712207436

I noticed CockroachDB does the same thing too, on node startup [0].

[0] https://www.cockroachlabs.com/docs/v23.2/cluster-setup-troub...

LeifCarrotson · 2024-04-04T03:43:36 1712202216

I find that one of the main benefits of a space holding file is that when it's needed, freeing up that space provides a window of time where you can implement a long-term solution (like buying a new drive with quadruple the storage space of the original for the cost of an hour of that employee/machine's time).

derekp7 · 2024-04-04T03:59:14 1712203154

Had an ex Navy Submariner for a manager many years ago. On some problem systems, he created several such files. He called them ballast files.

TazeTSchnitzel · 2024-04-04T08:25:17 1712219117

I've had a similar experience on my iPhone. The disk became so full that deleting things was seemingly no longer actually doing anything. Rebooting, the phone couldn't be logged into. Rebooting again, it boot-looped. Rebooting once more, it booted into an inconsistent state where app icons still existed on the home screen, but the actual app was missing, so the icon was blank and the app could not be launched. I became concerned about data integrity and ultimately restored from a backup.

I am certain this was a result of APFS being copy-on-write and supporting snapshotting. If no change is immediately permanent, but instead old versions of files stay around in a snapshot, then if you don't have enough space for more snapshot metadata you're in trouble. Maybe they skip the snapshot in low disk space situations, but they still have the copy-on-write metadata problem.

the-golden-one · 2024-04-04T12:56:59 1712235419

I’ve had the exact same thing. Amazing to think in 2024, despite all the clever APFS volume management stuff, you can still put a ‘sealed’ device such as an iPhone into a state where it has to be recovered by DFU just by filling up the user’s storage.

In contrast, after accidentally maxing out the space on my windows 11 office laptop which has a single data and boot volume, I was still able to boot it and sort the issue out.

entropicgravity · 2024-04-04T05:13:29 1712207609

I ran into a similar situation not long ago on the system partition of a linux installation. The partition was too small to begin with and as new updates piled up there was almost no space left to start deleting stuff. It took me about half an hour to find a subdirectory with a tiny bit of stuff that could be deleted. It was like being in a room so plugged up with junk that you couldn't open the (inward swinging) door to let yourself out.

From the tiny beginning I started being able to delete bigger and bigger spaces until finally it was clear and then of course I resized the partition so that wouldn't happen again. The End.

dataflow · 2024-04-04T05:24:26 1712208266

Confused, why didn't you just expand the partition to begin with?

And I feel like that ought to be the lesson for power users: always leave a bit of slack space after your partition.

entropicgravity · 2024-04-08T19:10:35 1712603435

My recollection is that gparted wouldn't allow me to adjust any partitions until I loosened up the available space in the system partition. I guess gparted checks all the partitions to make sure everything is as it should be before it allows changes.

albertzeyer · 2024-04-04T07:03:23 1712214203

When `rm file` gives you "No space left on device", a trick you can do:

    echo > file  # delete content of file first
    rm file  # now it should work

magicalhippo · 2024-04-04T07:25:32 1712215532

Reminds me of when a customer's database kept crashing with an error code indicating the disk was full. Except Windows Explorer showed the disk having hundreds of gigs free...

Took us a little while to figure out that the problem was the database file was so fragmented NTFS couldn't store more fragments for the file[1].

What had happened was they had been running the database in a VM with very low disk space for a long time, several times actually running out of space, before increasing the virtual disk and resizing the partition to match. Hence all the now-available disk space.

Just copying the main database file and deleting the old solved it.

[1]: https://superuser.com/questions/1315108/ntfs-limitations-max...

farkanoid · 2024-04-04T09:06:47 1712221607

My wife's IPhone 12 Pro Max had the same problem.

She somehow managed to fill up the entire 512GB. Updates were unsuccessful, she couldn't make calls and wasn't able to delete anything to make room.

She couldn't even back up her phone through iTunes, the only option was to purchase an iCloud subscription and back up to the cloud in order to access her photos.

_wire_ · 2024-04-04T06:28:39 1712212119

Ran into precisely this problem with a friend's Ventura Mini last year.

The solution was to boot into recovery and mount the Data partition using Disk Utility.

I don't recall where the Data partition gets mounted but I think it is:

"/System/Volumes/Macintosh HD - Data"

Or just Data, since Sonoma. It will be clear from Disk Utility.

Then close Disk Utility and go into Terminal and run rm on a big unseeded file.

You can find one using:

find <data-mnt> -size +100m

Using rm will fail.

Unmount the Data partition and run fsck on it.

This completes the deletion.

From there enough more space can be freed in recovery to have a healthy buffer, then reboot normally and finish cleaning.

It seems that when the volume gets so full that rm doesn't work anymore the filesystem also gets corrupted.

HTH and that I didn't forget anything.

wyldfire · 2024-04-04T03:27:16 1712201236

If you can truncate() an existing file (via 'echo > big_file.img' or similar), I would hope the filesystem could deallocate the relevant extents without requiring more space. Seems a bit like a filesystem defect not to reserve enough space to recover from this condition with unlink().

anticensor · 2024-04-04T05:06:14 1712207174

you need delete_and_dealloc(), not unlink()

whartung · 2024-04-04T02:38:16 1712198296

I had this happen to me, though I can’t recall how I fixed it.

In general I’ve had good success with Time Machine. I, too, have lost TM volumes. I just erased them and started again. Annoying to be sure but 99.99% of the time don’t need a years worth of backups.

The author mentioned copying the Time Machine drive. I have never been able to successfully do that. Last time I tried I quit after 3 days. As I understand it, only Finder can copy a Time Machine drive. Terrible experience.

That said, I’d rather cope with TM. It’s saved me more than it’s hurt me, and even an idiot like me can get it to work.

I did have my machine just complain about one of my partitions being irreparable, but it mounted read only so I was able to copy it, and am currently copying it back.

I don’t know if this is random bit rot, or if something is going wrong with the drive. That would be Bad, it’s a 3TB spinning drive. Backed up with BackBlaze (knock on wood), but I’d rather not have to go through the recovery process if I could avoid it.

Problem is I don’t know how to prevent it. It’s been suggested that SSDs are potentially less susceptible to bit rot, so maybe switching to one of those is a wise plan. But I don’t know.

skhr0680 · 2024-04-04T03:34:43 1712201683

> The author mentioned copying the Time Machine drive. I have never been able to successfully do that. Last time I tried I quit after 3 days. As I understand it, only Finder can copy a Time Machine drive. Terrible experience.

rsync -av $SOURCE $DEST has never let me down. Copy or delete on Time Machine files using Finder never worked for me.

> Problem is I don’t know how to prevent it. It’s been suggested that SSDs are potentially less susceptible to bit rot, so maybe switching to one of those is a wise plan. But I don’t know.

OpenZFS with two drives should protect you from bit rot. ZFS almost became the Mac file system in Snow Leopard.

pronoiac · 2024-04-04T04:03:14 1712203394

I have notes somewhere on roundtripping Time Machine backups between USB drives and network shares. (It's non-trivial, and it's not supported, but it worked.) It was with HFS+ backups, and there were various bits that were "Here Be Dragons", so I never posted them.

desro · 2024-04-04T04:55:05 1712206505

would be interested in a write-up of this if you ever get a chance

pronoiac · 2024-04-05T07:35:10 1712302510

I have a rough draft, I'll see if I can make it presentable.

armchairhacker · 2024-04-04T02:25:38 1712197538

btrfs used to have this issue, the problem being that the filesystem has to append metadata to do any operation including (ironically) deletion: https://serverfault.com/a/478742.

AFAIK it's fixed now, because btrfs reserves some space and reports "disk full" before it's reached. macOS probably does the same (I'd hope), but it seems in this case the boundary wasn't enforced properly and the background snapshot caused it to write beyond.

userbinator · 2024-04-04T02:40:25 1712198425

It looks like ZFS also suffers(ed?) the same problem:

https://zfs-discuss.opensolaris.narkive.com/BQ7RMcjo/cannot-...

jclulow · 2024-04-04T03:23:10 1712200990

Yes, ZFS has the same fundamental issue that all COW file systems have here. We have a reserved pool of space that you can't use except for operations like removing stuff.

The new problem with that reserved pool mechanism is that in 2024 it's probably way too big, because it's essentially a small but fixed percentage of the storage size. Don't let people use thresholds of total size without some kind of absolute cap!

willyt · 2024-04-04T09:10:46 1712221846

They mentioned that the problem occurred while steam was downloading, I wonder if because steam is ultra cross platform with bare minimum OS specific UI it is using something quite low level to write data to disk? Maybe NSFile does some checks that posix calls can’t do while remaining compliant with the spec or something weird like that. That would explain why people using various low level ´pro level’ cross platform tools like databases would have issues but typical garage band user is usually ok. If you’re doing database writes you probably don’t want the overhead of these checks making your file system performance look bad so it’s left to the software to check that it’s not going to fill up the file system. Stab in the dark hypothesis. I would hope that however we are writing data to the file system it shouldn’t be able to lock it up like this. I’d be curious for someone with technical knowledge of this to chime in.

themoonisachees · 2024-04-04T09:58:26 1712224706

Steam simply stores games in it's install folder, and while downloading the (compressed) game files it keeps them fragmented in a separate directory. As far as I can tell it doesn't employ special low-level APIs, because on lower power hardware (and even sometimes on gaming gear) the bottleneck is often the decompression step. This is what steam is doing when you are downloading a game and it stops using the network but the disk usage is going and the processor gets pinned at 100%.

I also heard of this happening to regular users downloading stuff with safari. It is simply terrible design on apple's part that you can kill a macOS install simply by filling it up so much that it becomes possible to not be able to delete files.

sspiff · 2024-04-04T09:28:10 1712222890

Still, you should not be able to brick your device into a state like this with legitimate, normal, non elevated operations.

If the POSIX API does have some limitation which would prevent this error from occurring with higher level APIs (which I sincerely doubt), macOS should simply start failing with errno = ENOSPC earlier for POSIX operations.

There is no other system that behaves like this, and we wouldn't be making excuses like this if Microsoft messed something basic up like this.

willyt · 2024-04-04T09:29:38 1712222978

I agree, though others are saying that BTRFS and ZFS can also get into this state.

sspiff · 2024-04-04T10:27:27 1712226447

I'd have to try, but have never encountered something like it on btrfs (though to be fair I've had many other issues and bugs with it over the years!)

I understand the logic, but typically I've seen filesystem implementations block writes once metadata volumes become close enough to full. Also, and I don't know if this is a thing on modern filesystems, you used to be able to reserve free space for root user only, precisely for recovering from issues like this in the past.

protoman3000 · 2024-04-04T08:39:39 1712219979

> I found that Sonoma has broken...

All too familiar. I have two Macs. I upgraded one of them to Sonoma and ever since then it has been nothing but headache and disappointment. Starting from the upgrade having failed (meaning I had to completely wipe the disk and install Sonoma from scratch, luckily I still had data), to problems with Handoff, the firewall seems to not work, Excel very slow etc etc.

I don't recommend using Sonoma.

andrelaszlo · 2024-04-04T10:44:30 1712227470

It's terrible :(

Bluetooth audio was a joke even before the update, and now it's almost unusable.

JackYoustra · 2024-04-04T06:51:22 1712213482

This happened to me! My solution was to go to an apple store, buy one of their portable SSDs right there, cp everything on to the SSD (that didn't appear to use any additional space!), wipe the mac, and then rm some unneeded stuff on the ssd before cp-ing back and using their no-fee return to return the SSD. There were a few esoteric issues, but for the most part it worked.

slillibri · 2024-04-04T16:19:30 1712247570

It seems odd that he acknowledges the existence of Time Machine local snapshots but doesn't mention deleting the local snapshots manually. Using `tmutil listlocalsnapshots` and `tmutil deletelocalsnapshots` will actually free up space. I had this experience recently when my local storage was at 99% and simple `rm` wasn't freeing up any space. Once I deleted the local snapshot, 100s of GB were freed.

As a side node, Time Machine has been pretty garbage lately. I back up to a local Synology NAS and letting it run automatically will just spin with `connecting to backup disk` (or some such message), but running manually works just fine.

Shorel · 2024-04-04T09:13:54 1712222034

This is a serious failure of the backup mechanism not being able to restore from a backup, and a serious failure of the operating system, not being able to delete files in a mounted external disk.

Luckily, all important files were in the cloud, and you could write a blog post describing these monumental failures.

quechimba · 2024-04-04T03:43:44 1712202224

I had an external hard drive that I overfilled by accident while making a manual backup of media files, and after that I couldn't even mount the APFS volume. Apparently it's something that can happen.

In the end I was able to mount and rescue the data using https://github.com/libyal/libfsapfs

I followed this guide: https://matt.sh/apfs-object-map-free-recovery

djmips · 2024-04-04T09:21:19 1712222479

Hey thanks for the knowledge!

delta_p_delta_x · 2024-04-04T06:31:24 1712212284

Huh, something like this happened to my mother's iPhone, too. She kept taking photos until the storage was filled to the brim.

One day she had discharged the battery completely, shutting down the phone; after recharging, she tried to restart it, only to be sent into a boot-loop. There is no (official) way to resolve this except repeatedly reboot and hope that at least once, Springboard loads and you can immediately jump into Photos and start mass-deleting, or at least connect to a computer and transfer the media out of the phone.

caseyy · 2024-04-04T02:56:12 1712199372

Windows with NTFS will also break in mysterious ways when the system disk is full.

There was a time when OSs could deal with the system disk being full quite well. And not so long ago.

benibela · 2024-04-04T12:51:32 1712235092

My Windows has been broken for a month, because the disk is full: https://superuser.com/questions/1834247/recovering-non-booti...

I do not know how to fix it without reinstalling

tredre3 · 2024-04-04T03:03:36 1712199816

I regularly run my Windows 10 NTFS drive down to 0 bytes free.

I'm always amazed at how the file system survives just fine, but the machine doesn't even crash!

I'm not sure where you got your experience from.

caseyy · 2024-04-04T03:53:16 1712202796

My work demands that I generate large amounts of data and I don’t know how much I’ll have to generate up-front. So I run out of disk space a lot.

My experience is that Windows and many of its programs will become very unstable with 0 b on the system drive. And about 3 times out of maybe 50, the system also became unbootable. I’ve learned to do whatever I can to free up space before restarting for stability.

The last time I’d regularly run out of space on Win was around Windows 98 times. I never had a problem then. Now in Windows 11 times, it’s a real headache.

Not sure how you’re so lucky.

Kwpolska · 2024-04-04T06:23:47 1712211827

I manage to get to 0 or close to that sometimes, usually through uncontorlles pagefile expansion. Some apps may misbehave, but explorer is stable enough to let me delete something.

juitpykyk · 2024-04-04T07:34:32 1712216072

Maybe you should use a secondary partition for work

caseyy · 2024-04-04T10:32:32 1712226752

Hehe, I probably should.

comex · 2024-04-04T03:58:07 1712203087

I’ve done the same on my Mac perhaps twice in the last few years. Like you, I encountered no crash or any other obvious consequences… I just deleted data and moved on. Though I didn’t try leaving it with zero bytes free for an extended amount of time, or rebooting. Who knows what would happen then.

Still, whatever this APFS bug is, the conditions to trigger it are more specific than just filling up the disk.

bloomingeek · 2024-04-04T03:46:23 1712202383

I tried, accidentally, to over fill the HDD on a Windows Vista machine. Vista popped up a box telling me I couldn't do it. Unfortunately, in my panic, I didn't take a picture of the warning for posterity.

sneak · 2024-04-04T02:09:30 1712196570

Time Machine has been consistently unreliable the entirety of the time since it launched well over a decade ago. It should be common knowledge that it sucks.

Use Backblaze if you don’t care about privacy, rsync+ssh to a selfhosted zfs box if you do.

jen729w · 2024-04-04T02:16:04 1712196964

Backblaze supports encryption, as does the excellent Arq, which is a better solution than rolling your own rsync+ssh.

lh7777 · 2024-04-04T03:04:26 1712199866

Backblaze Personal does support encryption, but it's always been incomplete. If you supply your own encryption key, it's true that Backblaze can't read your data at rest. But to restore files, you have to send your key to Backblaze's server, which will then decrypt the data so that you can download it. They say that they never store the key and promptly delete the unencrypted files from the server, but to me this is still an unnecessary risk. There's no reason why they couldn't handle decryption locally on the client device, but they justify on-server decryption in the name of convenience -- you can restore files via the web without downloading an app. If you're concerned about this, the solution is to use B2 with a 3rd party app like Arq.

philistine · 2024-04-04T03:15:04 1712200504

I actually use Arq to send my Time Machine backups and the rest of my NAS to S3 Glacier, in case the house burns down or the drives fail (whichever comes first). It works great and is very cheap!

sneak · 2024-04-04T08:26:11 1712219171

Arq is closed source and proprietary and its cryptographic functioning and integrity cannot be easily audited or verified.

Why use closed source crypto for money when free software that can be reviewed is available gratis? There are much better options.

philistine · 2024-04-04T22:54:58 1712271298

I can’t code. What good is it for me if the code is open source ? I can’t vouch for it and I don’t know anyone who vouches for open source code. Also, I have a Mac. Try to find open source software that runs for four years straight without a hitch on Mac. Arq has done that for me.

ink_13 · 2024-04-04T15:20:00 1712244000

That's not true, actually. Code for restoring from Arq backups is freely available: https://github.com/arqbackup/arq_restore

pitaj · 2024-04-04T04:28:52 1712204932

I'm curious, can you share more details? For instance, which Glacier tier do you use?

philistine · 2024-04-04T05:13:20 1712207600

The cheapest one. It would take either a long-ass time to restore or cost a lot of money, but I'm betting I'm not going to ever need it.

kstrauser · 2024-04-04T05:01:21 1712206881

Caution: restoring from Glacier can be hellishly expensive. Poke around at https://liangzan.net/aws-glacier-calculator/ and see what prices you see given your data size.

philistine · 2024-04-04T05:12:16 1712207536

Expensive external backups if I ever need it is better than none at all. It's a bet, but hey so is insurance.

EDIT: I checked your tool. It's a 1000 bucks to restore 4 TB in 48 hours. If the house burns down, insurance will cover that. I guess now I know I gotta check those drives a bit more.

kstrauser · 2024-04-04T05:30:10 1712208610

Ok, cool. As long as you know about it up front! I’ve heard nightmare stories of people being very surprised by their bill afterward.

Dylan16807 · 2024-04-04T07:43:39 1712216619

> It's a 1000 bucks to restore 4 TB in 48 hours.

What? This tool is exceptionally out of date. Retrieval cost is $30/TB at the high end, and for glacier deep archive and a 48 hour window it only costs $2.50/TB. (Plus a few cents per thousand requests, so maybe don't use tiny objects.)

Glacier's percentage-rate-based retrieval pricing was only active from 2012-2016.

The bandwidth charge of $90/TB is still accurate. Though there are ways to reduce it.

csnover · 2024-04-04T04:56:49 1712206609

> If you supply your own encryption key, it's true that Backblaze can't read your data at rest.

It’s worse than this. The private key for data decryption is sent to their server by the installer before you can even set a PEK. Then, setting the PEK sends the password to them too, since that’s where your private key is stored. So you have to take their word not just that they never store the key and promptly delete unencrypted files during restoration, but also that they destroy the unprotected private key and password when you set up PEK. It’s a terrible scheme that seems almost deliberately designed to lull people into a false sense of security.

abhinavk · 2024-04-04T02:15:32 1712196932

You can use backup tools like restic/borg/kopia to encrypt/compress before uploading to Backplaze or any other cloud service.

qiqitori · 2024-04-04T04:22:11 1712204531

Maybe flamebait, but here is my honest opinion, which I believe is aligned with the hacker ethos: maybe if you were using an open-source operating system you could, with a little experience, write a simple tool that deletes a couple files without allocating new metadata. (Or more likely, somebody else would have been there before you and you could just use their tool.)

Dylan16807 · 2024-04-04T07:22:49 1712215369

> maybe if you were using an open-source operating system you could, with a little experience, write a simple tool that deletes a couple files without allocating new metadata

For a filesystem where this happens, it would not be simple and it would require a lot of experience to get right.

> Or more likely, somebody else would have been there before you and you could just use their tool.

I don't think open-source makes such a tool much more likely to exist.

Kwpolska · 2024-04-04T06:20:19 1712211619

Most users on open-source operating systems can't code as well. And even if they could, this still requires knowledge or guesswork to find the trick for the deletion. Some people suggest truncation, and that's possible with a shell, but what would you do if it failed as well?

qiqitori · 2024-04-04T07:17:52 1712215072

I have an answer, but I'm not sure you'll find it particularly fulfilling because it is quite hypothetical.

The average user just needs to be able to ask the question in a decent place. E.g., Hacker News or a fitting Stack Exchange site. Some developers not afraid of touching the kernel will see the question (or one like it), and if no workaround (e.g. truncation) is found to be acceptable, someone may decide to look into the kernel source to see if it's feasible at all. They may find a lower level function that deletes without writing metadata. Or they may find the function in the filesystem driver's source code where the metadata is written first, and if that was successful, the actual data is written. In the easiest case, you could create a copy of the function with the calls swapped, and a live CD with the modified driver could be created. (Course, this solution is quite unsafe, as writing the metadata could still fail for some other or related reason, so it's a bit of an emergency solution.)

There are two other filesystems that were mentioned in the discussion here, btrfs and ZFS.

ZFS solved the problem by reserving space, so creating such a tool isn't needed. (However, ZFS is not part of Linux, so I'm not too interested in digging into the details.)

btrfs users apparently accept this as a fact-of-life, but have what they consider decent-enough workarounds, see e.g. https://www.reddit.com/r/btrfs/comments/ibjrpm/can_i_somehow....

(I use neither ZFS nor btrfs; I prefer boring filesystems, thank you very much.)

yjftsjthsd-h · 2024-04-04T05:46:17 1712209577

Does such a tool exist for BTRFS or ZFS?

bjoli · 2024-04-04T08:31:46 1712219506

I was in this situation with BTRFS, but it was simple to extend the partition with an old 2gb usb stick and the problem resolved itself.

yjftsjthsd-h · 2024-04-04T19:31:41 1712259101

Sure, but that's orthogonal to open/closed source.

RamRodification · 2024-04-04T06:48:11 1712213291

> ... Samba (the disk-sharing protocol), ...

Pet peeve (or alternatively: correct me if I'm wrong), Samba is not a protocol. It's a software suite that implements the SMB (Server Message Block) protocol

dannyw · 2024-04-04T02:06:47 1712196407

This happened to me too, without Time Machine. It’s devastating how rm on macOS can’t remove files when the disk is full.

ycombiredd · 2024-04-05T08:39:31 1712306371

I see a few other comments trying to address this with a similar truncation shell command but then a few arguing replies about the redirection operator not working due to some syntax error. So here’s the trick there… preface the redirection operator with a null command, like so

:> somefile

Note the colon, which stands in for the command as a null operator.

An additional hint is that if the system is so starved you can’t run ls, again call On the shell for help and use “echo *”, which will show a list of file names (without sizes) from which hopefully you can select a large file (or few) based on name alone. Colon-greater them into zero byte status and watch your system become useable enough to begin normal recovery efforts.

From an old Unix admin; I hope that helps someone.

saurik · 2024-04-04T04:35:37 1712205337

I know someone who ran out of disk space on her iPhone and then tried to fix some issues she didn't realize were being caused by it by upgrading the device to a new version of iOS; but then it failed the upgrade as it couldn't resize the disk but had already committed to doing such (I later laboriously figured this out by debugging the process using idevicerestore). I feel like this was a bug in its "how much space will I need to have to install successfully, let's verify I have enough before I begin" calculation, and maybe later versions of iOS have fixed the issue, but sadly they are all even larger and the fixed version would just prevent it from trying to upgrade in the first place, not fix it once it got to this point.

FartyMcFarter · 2024-04-04T10:23:25 1712226205

> Terminal: While Terminal would launch, using the standard Unix rm command resulted in a similar error: “No space left on device.”

This looks like a huge bug, and the elephant in the room.

What's the reason for `rm` requiring space left on the device?

dredmorbius · 2024-04-05T03:00:52 1712286052

I'd run into this situation earlier this year, on MacOS, though in my case the restore from Time Machine backup did work.

But I found myself in the exceedingly frustrating and confusing situation of not being able to free space by deleting files:

- 'rm <filename>' from bash

- Emptying Trash.

- Deleting ".Trashes" directories.

- Booting single-user and attempting to delete files.

- Booting to the Recovery system, invoking terminal, and attempting to delete files.

- Attempting to remove snapshots using the 'tmutil' utility, either booted normal, safe-mode, single-user, or rescue. Best I can tell, tmutil simply would not run under single-user or rescue modes.

Final solution was to repartition the hard drive, re-create filesystem, reinstall the OS, and recover user files from Time Machine. This ... took a while.

My take-away is that MacOS behaves exceedingly poorly under any number of high-resource-utilisation modes (memory, CPU, or disk usage).

Some discussion from the time on the Fediverse: <https://toot.cat/@dredmorbius/111849495055957910>

resource_waste · 2024-04-04T09:17:32 1712222252

The sheer number of people who have ran into this is a bit mind boggling.

Clearly Apple knows about this, they make no effort to fix it?

supermatt · 2024-04-04T09:17:07 1712222227

Sounds like there was no room to write to the journal (journaling is enabled by default on HFS+). Disabling the journaling for that volume (which it doesn’t appear you tried?) will have likely allowed you to perform your deletes.

diskutil disableJournal “/Volumes/Macintosh HD” (or whatever volume is)

djxfade · 2024-04-04T10:44:45 1712227485

New macOS versions use APFS, not HFS+. As far as I'm aware, you can't disable journaling on APFS.

bingo-bongo · 2024-04-04T02:14:19 1712196859

Probably needed to delete some of the mentioned disk snapshots, before trying to delete files.

https://support.apple.com/en-gb/guide/disk-utility/dskuf8235...

mproud · 2024-04-04T03:18:29 1712200709

Safe Boot is your magical way to have the computer delete purgeable and temporary files, like boot caches. Hold shift down and once it gets to the login window, restart again.

Otherwise, go to Recovery mode, mount the disk in Disk Utility, and then open Terminal and rm some shit.

klausa · 2024-04-04T03:58:41 1712203121

The author mentions trying this in recoveryOS to no success.

kazinator · 2024-04-04T09:11:20 1712221880

In the shell you can truncate files to zero size with a redirection:

  $ > large-file

It's possible that rm / unlink require working space to perform the transaction of removing the file from the directory, while truncation does not.