It would be reasonable to interpret the shorter boast as making the very surprising claim that it beats anything else, including lossy JPEG, in all categories of performance, including compression ratio. As it turns out, they don't intend to claim that, because it's not true. It's probably impossible for a lossless file format to do that, even for commonly occurring images. (It's certainly possible for a lossless image format to do that for images that have a lot of redundancy in a form that JPEG isn't sophisticated enough to recognize.)
Surely nobody could interpret this as beating an arbitrarily lossy format. I could just erase the entire input, after all. One would have to incorporate quality into the metric. Because of that, it seems natural to assume they mean lossless.
If a given lossy format is inefficient enough, then it’s possible for a lossless format to beat it at compression. I initially interpreted the website as making the surprising claim that FLIF beats several popular lossy formats while itself being lossless. Most people understand “JPEG” to mean the popular lossy format, JPEG. The site could be a lot clearer that that they are actually comparing it to lossless JPEG, lossless WEBP, etc.
> It's probably impossible for a lossless file format to do that, even for commonly occurring images.
It's actually provably impossible using a simple counting argument. A lossy algorithm can conflate two non-identical images and encode them the same way while a lossless algorithm can't, so on average the output of a lossless algorithm is necessarily larger than a lossy one because it has to encode more possible outputs.
Yes, of course it's impossible to losslessly compress all possible images by even a single bit. But how predictable is the content of typical images? How much structure do they have? They certainly have a lot more structure than PNG or even JPEG exploits. Some of the “loss” of lossy JPEG is just random sensor noise, which places a lower bound on the size of a losslessly compressed image, but we have no idea how much.
Doesn't matter. Whatever you do to compress losslessly you can always do better if you're allowed to discard information. And the structure is part of the reason. For lossless compression you have to keep all the noise, all the invisible details. With lossy compression you're allowed to discard all that.
You should take more time before you start throwing out terms like "provably impossible". A sufficiently bad lossy algorithm can be worse than an algorithm like PNG in every single case. For example, a format based on PPM that can lossily reduce the file size by up to half.
> A sufficiently bad lossy algorithm can be worse than an algorithm like PNG in every single case.
Well, yeah, obviously if you want to shoot yourself in the foot that is always possible. But for any lossless algorithm it is trivial to derive a corresponding lossy algorithm that will compress better than the lossless one.
Fair enough. Nonetheless, a lossy compression algorithm by definition produces fewer possible outputs than possible inputs. A lossless compression algorithm, again by definition, produces exactly as many different outputs as there are possible inputs. Therefore, on average, unless the lossless algorithm has a perverse encoding that uses more bits than necessary to encode the possible outputs, no lossless algorithm can beat it on average. It is of course possible that a lossless algorithm might beat a lossy one on some select inputs. It's even possible that those inputs are the ones you are interested in, but in that case you are using the wrong lossy algorithm.
> Therefore, on average, unless the lossless algorithm has a perverse encoding that uses more bits than necessary to encode the possible outputs, no lossless algorithm can beat it on average
I think you mean "unless the lossy algorithm has a perverse encoding."
But your argument proves too much. It's true that no lossless algorithm can beat a lossy algorithm if averaged across all possible images, for precisely the reason you say. But for the same reason, no lossless algorithm can beat the "compression algorithm" of no compression at all, like BMP but without redundancy in the header, averaged across all possible images.
The unknown question — which I think we don't even have a good estimate of — is how much incompressible information is in the images we're interested in compressing. Some of that information is random noise, which lossy algorithms like JPEG can win by discarding. (Better algorithms than JPEG can even improve image quality that way.) But other parts of that information are the actual signal we want to preserve; as demonstrated by Deep Dream, JPEG doesn't have a particularly good representation of that information, but we have reason to believe that on average it is a very small amount of information, much smaller than JPEG files.
If it turns out that the actually random noise is smaller than the delta between the information-theoretic Kolmogorov complexity of the signal in the images we're interested in compressing and the size of typical JPEG files for them, then a lossless algorithm could beat JPEG, as typically applied, on size. Of course, a new lossy algorithm that uses the same model for the probability distribution of images, and simply discards the random noise rather than storing it as a lossless algorithm must do, would do better still.
We are seeing this in practice for audio with LPCNet, where lossy compression is able to produce speech that is more easily comprehensible than the original recording: https://people.xiph.org/~jm/demo/lpcnet_codec/
It seems unlikely to me that the ideal predictor residual, or random noise, in an average photo (among the photos we're interested in, not all possible Library-of-Borges photos) is less than the size of a JPEG of that photo. But we don't have a good estimate of the Kolmogorov complexity of typical photos, so I'm not comfortable making a definite statement that this is impossible.
For example, one unknown component is how random camera sensor noise is. Some of it is pure quantum randomness, but other parts are a question of different gains and offsets (quantum efficiencies × active sizes and dark currents) in different pixels. It might turn out that those gains and offsets are somewhat compressible because semiconductor fabrication errors are somewhat systematic. And if you have many images from the same camera, you have the same gains and offsets in all of the images. If your camera is a pushbroom-sensor type, it has the same gains and offsets in corresponding pixels of each row, and whether it does or not, there's probably a common gain factor per pixel line, related to using the same ADCs or being digitized at the same time. So if your lossless algorithm can model this "random noise" it may be able to cut it in half or more.
> I think you mean "unless the lossy algorithm has a perverse encoding."
That's right.
> It's true that no lossless algorithm can beat a lossy algorithm if averaged across all possible images
I concede the point. I thought that this argument applied to the average of any input set, not just the set of all possible images, but then I realized that it's actually pretty easy to come up with counterexamples e.g. N input images, which can be losslessly compressed to log2(N) bits, which will easily outperform jpg for small N. That's not a very realistic situation, but it does torpedo my "proof".
It's probably still true that no lossless format could possibly beat lossy JPEG for typical photos.
If I understand your example correctly, it only has the lossless algorithm beating JPEG by cheating: the lossless algorithm contains a dictionary of the possible highly compressible images in it, so the algorithm plus the compressed images still weighs more than libjpeg plus the JPEGs. But there are other cases where the lossless algorithm doesn't have to cheat in this way; for example, if you have a bunch of images generated from 5×5-pixel blocks of solid colors whose colors are generated by, say, a known 68-bit LFSR, you can code one of those images losslessly in about 69 bits, but JPEG as typically used will probably not compress them by an atypical amount. Similarly for images generated from the history of an arbitrary 1-D CA, or—to use a somewhat more realistic example—bilevel images of pixel-aligned monospaced text in a largish font neither of whose dimensions is a multiple of 8, say 11×17. All of these represent images with structure that can be straightforwardly extracted to find a lossless representation that is more than an order of magnitude smaller than the uncompressed representation, but where JPEG is not good at exploiting that structure.