Hacker News new | past | comments | ask | show | jobs | submit login

Since there seem to be experts here in the discussion: I keep thinking about a little toy open-source project to adress bitrot in TimeMachine backups. A utility that compares chekcksums of the original file against its backup. Apparently Apple started using hashes internally in Time Machine and it would probably be simplest to use those. However, I cannot find any documentation on the internals of Time Machine. Does anyone know where to start looking? Would such a tool make sense at all?

I really like this idea. It's a relatively simple approach that could work with any fs/backup strategy. It would be great if it wasn't dependent on Time Machine.

ZFS/Raid + ECC ram are complicated and expensive and are not even a 100% guarantee against bit rot.

I've found and been using this. https://github.com/rselph/sumcheck

Does it make sense? No idea.

tmutil verifychecksums should already do this.

As far as I understand it, this will verify checksums within the time machine backup by comparing hashes with the ones that were created during backup. However, it will not compare the files against the ones on the system. So it protecte against bitrot on the backup disk only.

Well, you'll need some kind of heuristic to figure out if a mismatch is due to bit-rot, or because the working file's changed. Maybe modification date, but a) what if a file's restored from an older backup, b) mtime itself is corrupted (APFS would guard against this I suppose).

In any case, the "hash" appears to be CRC32, stored in extended attributes:

  $ xattr .inputrc
  $ xattr -px 'com.apple.finder.copy.source.checksum#N' .inputrc
  26 E5 4A AB
  $ cksum .inputrc
  2873812262 65 .inputrc
  $ printf '%x\n' "$(cksum .inputrc | cut -d ' ' -f 1)"

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact