Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
File Integrity 6: Which image format is most resilient? (eclecticlight.co)
57 points by zdw on April 22, 2020 | hide | past | favorite | 52 comments



In general, most compressed file formats don't even bother to be resilient against errors, since error correction mostly runs against the point of compression (to make files as small as possible). Some file formats do have error detection - for example, PNG provides a CRC-32 of every chunk, and compressed formats like Zip and gzip have similar checksums.

If you happen to know (or guess) that a compressed file with a checksum only has 1-2 bytes in error, it is definitely possible to recover the original file through bruteforce. You can probably even do this to a file without a checksum just by testing which possibilities parse correctly.

However, corruption on filesystems rarely takes the form of a small number of bytes; oftentimes the file's metadata gets corrupted such that entire blocks of the file (anywhere from 512 bytes to 64KB) are swapped, dropped or lost. Instead of worrying about corruption-resistant image formats (since storing everything as uncompressed TIFF/BMP is incredibly wasteful), it's probably worth using a more corruption-resistant filesystem instead (i.e. not FAT!).


What’s surprising to me is that no file system (even ZFS) or database utilize error correction controls. Well aside from 1 bit/byte ECC style codes or raid. There are a few forms of forward error correction [1], which are already widely used in hardware [2]. There are some projects trying for long term fixity of storage [3], but nothing default or built in, afaict.

I assumed RocksDB or some other modern KV engine would include FEC, at least the meta-data. Using DB’s on embedded devices reveals how easily (and regularly) a whole DB of data can be corrupted. A few bytes of error and an entire IoT DB can be lost. FEC’s can be cpu expensive but not that expensive nowadays.

1: https://en.wikipedia.org/wiki/Forward_error_correction 2: https://www.electronicdesign.com/technologies/communications... 3: https://github.com/lrq3000/pyFileFixity


I would expect that filesystems do not integrate conventional FEC (e.g. block codes), because FEC is poorly suited to the kind of errors that affect filesystems.

FEC works against errors that have a predictable expected error rate, where errors are evenly distributed among the data. This applies to block codes (where FEC will have a maximum decodable error rate per block) and convolutional codes which don't have 'blocks', but still have a maximum local error density they can reliably correct. For media where burst errors are a possibility, FEC is almost always paired with interleaving to increase the likelihood hor a ~uniform distribution of errors.

FEC works pretty well when the problem is "I wrote some data to disk that I can't read back correctly, but I can read the other data back just fine". At the media level, hard drives have used FEC to protect individual bits/sectors for a long time. Likewise at the disk level, RAID 4/5/6 do use FEC to protect against individual disk failures.

Errors experienced at a filesystem level are not usually due to 'random' corruption of the data, but to invalid operations on the data structure e.g. a crash while flushing the data to disk which leaves the filesystem in an inconsistent state. In other words, the filesystem-specific errors are more likely to be "I wrote the wrong data to disk" or "I only wrote some of the data I meant to".

FEC will not defend against that well, because any FEC must be written to the disk along with the data to prevent the FEC+data from getting into an inconsistent state. For example, the 'RAID 5 write hole' (which also happens with RAID 1 and RAID 6) exists because physical disks will not commit data to the disk at exactly the same time even when the write commands are issued at the same time (which mostly won't happen anyway, since write commands tend to be serialized on your SCSI/Fiberchannel/whatever interface to multiple drives). That means there exists a time where the data has been updated but the parity has not (or vice versa), and if a drive fails at that time, the RAID array may not be cleanly recoverable.

So for practical reasons, if you write the wrong data to disk, you'll probably write the wrong FEC too, and FEC won't help recover much. If your allocation table gets corrupted and you write part of one file on top of another, you'll be overwriting the FEC too. For conventional FEC.

BUT... error correction isn't just "conventional FEC". Error correction in general is any time you add redundancy to data to make it more tolerant to errors. In the filesystem space, this happens, but it isn't usually done with parity matricies or convolutional state machines. Instead, redundancy is added by things like log-structured file systems. The 'write what you plan to do, then write the actual data, then write that you successfully did it (in that order)' behavior of log structured file systems does add redundancy and does make error recovery more possible.

In summary, "error correction" in general is used at several levels for file storage:

- block FEC at the bit level

- block FEC at the drive level if RAID is used

- robust CRC error detect + automatic repeat request over your PCIe/NVMe link or over a network connection

- Redundant metadata in your filesystem journal

but the key is to use the right kind of error correction for the expected error conditions.


I suggest you read a bit more about reliable data transmission and FEC. Burst error FECs absolutely available and used (mostly FEC + interleaving in this order).


> What’s surprising to me is that no file system (even ZFS) or database utilize error correction controls.

Maybe I misunderstand your statement, but ZFS has had error correction from the very first versions.


AFAICT, it only has checksums for data, and multiple metadata copies: https://serverfault.com/a/381918

So you’d need raid for error correction.


I suppose you could patch in PAR encoding for data, would that be significantly different than specifying copies=N for data and/or making multiple pools single disk and raiding them?


PAR encoding is much more space-efficient than storing multiple copies, though using it efficiently would be really tricky. Though not more tricky than most other filesystem development.

If you can indeed RAID-z multiple pools across a single disk, that would give many of the same advantages, without having to duplicate everything. Obviously, it would also greatly affect performance, with every IO operation becoming N operations against one disk.


Why would PAR be tricky to use efficiently? I just see it as a file encoding.


It's only a file encoding if you do it per file, but that removes a lot of the advantages of sharing and minimising the recovery data. PAR is usually across multiple files/blocks, but then you've got to manage this extra data when any part of any of those files/blocks changes. The more complete/parallel your recovery data is, the more you need to update on even a small change to one file.

There'd be all sorts of ways to make it more efficient, but it would take some analysis to work out the best trade-offs.


You don’t need RAID. It’s easy to miss in the answer, but ZFS can be instructed to simply save data up to three times. This is in addition to metadata redundancy.


raidz{1,2,3} are error correction modes built into ZFS.


You can just use dm-raid for that if you like. Just because it's named "raid" doesn't mean you have to use it across disks.


True, and I’ve considered doing that, but it means you’re doing 2/1 or at best 3/2 storage storage rates. Raid only uses XOR, but Reed-Solomon or other FEC codes can produce way better error correction rates. Say full recovery for standard storage error rates using only 1/10 the extra space along with more computation at recovery time. Add CPU hardware support like AES and then even battery devices could use it.


Article on fec’s and storage, seems Ceph utilizes FEC: https://www.networkcomputing.com/data-centers/putting-erasur...


The device mapper framework is open source. You could just extend it with additional ECC modes if those would be useful to you, and then these would become usable in both RAID and unRAID setups.


RAID uses more than XOR once you get past RAID5, usually.

The lesser known bit about ZFS RAIDZ for example is that each level introduces new error correction code. RAIDZ1 uses plain XOR. RAIDZ1 has one disk with XOR one disk with Reed-Solomon. RAIDZ3 adds third disk which has Reed-Solomon but implemented differently.

The effect is that you have 3 different codes available.


Many SAS disks offer larger sector sizes, which in turn allow storing such ECC/checksum data right there. But, what e.g. Btrfs does is to just checksum and raid, because silent corruption is _really_ rare, and you'd want some amount of raid anyways. Keep in mind that the disk already implements FEC on a low level.


Yes, btrfs works better on embedded wrt to corruption than ext4. ZFS is too hard to setup. But on embedded devices silent corruption isn’t so rare. After 2-3 months of usage on an SD card it’s almost guaranteed in my experience from last year. USB thumb drives & eMMC fair much better, but I’ve still lost entire DB’s.

Given all of that, when corruption does happen you can frustratingly (rocksdb at least) loose an entire dB due to a single checksum error. It’s good rocksdb has checksums but an FEC code could both checksum, and be used to correct the data. Unfortunately off-device backups aren’t an option.


Might be an idea to offload that. You could reduce the risk of corruption by splitting the database over timeframes, so every 3 weeks, you create a new database, copy all currently relevant data and close out the old one. In case of corruption you can recover from an older version of the database, which is less likely to be corrupted the more versions you keep and the more often you do this (at the expense of flash lifetime).


If you have the space for storing data twice, you can actually use `dup` for the data, not just the metadata.

That conversion can be done online, though I managed to make it zero out all superblocks on all 3 disks while converting from raid1 to raid5. That's some time back, however, and there was some unfortunate power cycling.


I guess this means the PNG reader in the article example ignores the CRC-32? I wonder if a smarter reader could warn on error, assume a bit flip and iterate to see if any one flip causes a CRC-32 check to no longer mismatch (what's embedded)?

Of course, the embedded CRC-32 checksum itself could have been corrupted.

While I agree this is better a function of filesystems, images invariably are written to FAT from the get go. And the default filesystem offerings on most OS's lack error detection: NTFS, APFS, and ext4 being top three. There are a few Linux distributions, most notably openSUSE, that defaults to Btrfs which has checksums for everything. (I'm not certain whether FreeBSD defaults to UFS or ZFS; of course ZFS also checksums everything.)


> I'm not certain whether FreeBSD defaults to UFS or ZFS; of course ZFS also checksums everything.

There isn't really a default; the installer makes you pick UFS/ZFS.


Many PNG readers that I've tested do verify the CRC-32, but there are obviously many that don't. Verification chews up cycles that are useless 99+% of the time, and for PNG, verifying a chunk's CRC32 would delay decoding if the file is being streamed.

I made a CTF challenge about uncorrupting PNGs a few years back; here's a writeup from a team that solved it: https://github.com/ctfs/write-ups-2015/tree/master/plaidctf-.... In this particular case, a PNG was uploaded as "text" to an FTP server, which corrupts newline sequences. A bit of cleverness allows you to recover the original. (The PNG header even explicitly contains "newline" sequences to catch this type of corruption!)


JPEG can be made almost as resilient as an uncompressed format by adding in restart markers.

Restart markers duplicate the JPEG headers within the file, at however many intervals you like, so the bitstream can be, well, restarted, limiting the effect of any corruption.

`jpegtran` can insert them losslessly.


Interesting, didn't know. For arbitrary data I found "rsbep".


I wrote my little consistency checker (https://github.com/jwr/ccheck) specifically for checking my image archives against corruption. I am appalled that in this day and age we still (mostly) use tools and filesystems that have no way of knowing if data has been corrupted.

Before I wrote the tool, I searched online and couldn't find anything that would be small, simple, easily maintained, and would last for at least 10 years.

I now use it with all my archives, which means I know if a given replica is correct.


I like the idea of a utility that can add ECC to any arbitrary file. You'd need to ensure you had an uncorrupted copy of the utility that you could use in the future though, or the images become unreadable even if they're uncorrupted themselves - unless the ECC is contained in a sidecar file.


PAR2 or most file compression formats can fit the bill. Of course if it's going to stay put a filesystem that does this might be more convenient.


Yes, it might be more convenient when built into a filesystem. But now you must keep alive an OS that can handle that filesystem. That seems like a much greater burden for the long haul.


A previous post in the series tests a Parchive utility called MacPAR deLuxe:

https://eclecticlight.co/2020/04/20/file-integrity-5-how-wel...


Looks like a very useful utility, and it keeps the ECC in a .par2 sidecar file exactly as I wanted. Unfortunately it's Mac only.



Thank you very much for that link. Do the variants all use the same file format?

It looks like the sidecar file doesn't contain the ECC information, only checksums that allow you to know if the file is corrupted or not.


ECC plays the role of a checksum too.


That's the idea behind parity files (.par2). I wrote a tool [1] that helps you use it on whole filetrees, because checking state and error handling may get tedious with the regular par2 command.

[1] https://github.com/brenthuisman/par2deep


If you're afraid of file corruption, install par2 with homebrew.

My procedure for pictures is that monthly, I copy them off of my phone onto my storage. Then I run par2 in that directory to add a parity file. That all goes onto the backup.


I do the same. To help with 'maintenance' (adding par to newly added files, checking and handling integrity for existing ones) I wrote a small utility you may like: https://github.com/brenthuisman/par2deep


Very nice, thanks!


Genuine question: why would you want that? Isn't that the role of the file system or the data transport?


This might be a stupid question, but with a container format such as HEIF could you simply store multiple identical copies of an image inside the container? If one image gets corrupted you have the other to fall back on.

Two HEIF or AVIF images should be about the same size as a single JPEG too.


It's both much less wasteful and much more useful to add an error correction (FEC) block instead. With random errors spread across two copies, how do you know which chunks of each are correct? Sure, you could use a Merkle tree or similar, but FEC is a better solution. For size penalties of order 10% in practical use, FEC blocks can detect and correct errors not just in the data but in the FEC blocks themselves too.

Their biggest downside is that their computational complexity can be high. This is only a problem during creation and decoding on error; if the data is checksummed, you only need to take the slow path and touch the FEC block if an error is encountered. This is a great tradeoff for archival uses, and good for many other uses.


Is “vandal” by chance an open source utility? I’d like to test some other files and formats.


It doesn't sound like it would be hard to write your own. Not sure how useful it is though, since it doesn't make any attempt to replicate the kind of errors you would see in the wild.


Doesn't look like the code is freely available, although the author states they'll provide a copy of the app for research purposes if asked: https://eclecticlight.co/2020/04/19/last-week-on-my-mac-vand...


As an alternative option: zzuf[0] is meant to be a fuzzer, but its take on that is "randomly change random bytes in a file" (it can be run as a wrapper around a victim process like your image viewer, but it's happy to run on bare files).

[0] http://caca.zoy.org/wiki/zzuf


Would you outperform uncompressed TIFF with sufficient copies of a compressed file? Or compressed file plus parity file?


There used to be a format that was highly resilient to errors. It was called film.


Of course singular - or linear, say, from bad projection device - defects in that are rather final, and this format has a host of inconveniences like fire hazard, inability to get transferred over networks, comparatively huge costs of copying, large physical size demands etc.

We can perhaps use a convenient digital format for images and add to it, ahem, anti-corruption measures. Single file is still going to be vulnerable - even though to a less extent - if you, say, add some Reed-Solomon protection; but now you can use distributed networks the size of a planet, or even more. Organization of such a storage could be an interesting problem - at least partially solved already.


I'm not saying film is perfect. But it's an often overlooked choice that has worked well for the last 50 years.


As long as you didn't send it through an airport X-ray machine, among other issues.


X-ray machines are only a problem for undeveloped film. Developed film is much more resilient.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: