Hacker News new | past | comments | ask | show | jobs | submit login

>Filesystems don't zero out deleted data and dd isn't aware of the filesystem mapping, so unless it's a completely fresh drive, you'll still pull off garbage data.

That may have been true a very long time ago.

Today, some filesystems such as ZFS can effectively, and automatically, zero out deleted data using trim on devices that support that. (I say "effectively" here because we don't actually know if the data is literally zero'd by the underlying device, and I say this even though that distinction doesn't actually matter in this context. Once trimmed, those logical sectors will always read as zeros until something new is written to them, and this is the functionality that is important in this context.)

This function is useful for SSDs, and is also useful for SMR spinny-disks.

Tons of other combinations of filesystems and operating systems also deal quite well with trimming of unused space, though this more-often happens as a scheduled task instead of something that is taken care of by the filesystem itself.

Trim (in various implementations) has been broadly used for well over a decade, and a trim'd device can lead dd to be able to produce sparse files.

---

Now, that said: It probably doesn't matter much if a particular dd-esque tool is set to create sparse output files or not. Sure, some space may be saved, and sparse files sure are cute and cuddly.

But it's probably a fool's errand to even plan such an operation on a machine that has less free space than the total maximum capacity of the thing being rescued: Either there's enough room to write a non-sparse image, or there isn't enough room to even think about starting the process since it might not be able to complete.

(If space becomes an issue later on down the road, the output file can be "sparsified" in-place using "fallocate --dig-holes" in instances where that makes sense.)

And I definitely want the whole disk imaged, which means that I definitely do not want ddrescue's interpretation of metadata to determine filesystem allocation and limit the scope of that image: This is the first step of a data rescue operation, and that makes it the worst place for data to be intentionally thrown away or disregarded.

If things are failing hard enough that any of this work is on the table, then obviously the combination of the source disk and filesystem is untrustworthy -- along with the metadata.

Getting all of the bits backed up -- regardless of their apparent lack of importance -- should always be the prime directive here. Any extra bits can always be tossed later if they're eventually deemed to be actually-unimportant.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: