Hacker News new | past | comments | ask | show | jobs | submit login

Objective metrics (like dssim) are not the best way to compare image codecs. This is in part for the obvious reason - that none of them are perfect and the human eye is better - but also for other reasons, including that some codecs use metrics internally for bitrate targeting, and these metrics can disagree with the metrics used by other codecs. Without a visual comparison, it's impossible to say which metrics are better for a given set of images.

JPEG XL was based (in part) on Google's "pik" codec, which used the Butteraugli metric for bitrate targeting.

Use your eyes instead: https://afontenot.github.io/image-formats-comparison/#wasser...




Objective metrics have many weaknesses and pitfalls, but keep in mind that just eyeballing images is not necessarily better.

Non-blind n=1 human test can be just as flawed. What you like in a particular scenario is not representative of codecs' overall performance.

Testing with humans requires proper setup and a large sample size (which BTW JPEG XL has done!)

The problem is that these codecs are close enough in performance to be below "noise" level of human judgement.

You will not be able to reliably distinguish q=80 vs q=81 of the same codec, even though there is objectively a difference between them. And you can't lower quality to potato level to make your job easier — that changes the nature of the benchmark.

People also just differ in opinion whether blurry without detail is better than detailed but with blocking or ringing. If you ask people to rate images, stddev of their scores will be pretty large, so wide that scores of objective metrics can fit in 1-sigma.

People also tend to pick "nicer" image, rather than the one that is closer to the original. That's a bit different task than codecs aim for.

Codecs can allocate more or less of the file to color channels, so you can get different conclusions based on e.g. amount of bright red in the image.

So testing is hard. Plenty of pitfalls. Showing a smaller file that "looks the same" is easy, but deceptive.


> People also tend to pick "nicer" image, rather than the one that is closer to the original. That's a bit different task than codecs aim for.

Samsung turns their screens and phone cameras to “vivid” processing by default (they do provide a “natural” toggle, to their credit).

I think over 90% of people don’t realize what they are seeing is not close to reality, and instead ask people with other phones, especially iPhones, why their screen or photo looks so washed out.

There’s many faults to BFDLs, but I love Apple providing “vivid” processing as an option and “natural” as the default. Sometimes the masses just don’t know what’s actually good for them.


The question is photos is whether people want something close to reality or would they rather have an exaggerated, arguably more beautiful version of reality to remember and share? I guess unless you're doing journalism, arguing for less processed photos may be a moot point, at least in phone cameras, as those are likely overwhelmingly often used for snapshots.


Agreed that large double-blinded surveys with trained participants is pretty much the gold standard, and that there's no true objectivity about e.g. blurring vs blocking.

> Showing a smaller file that "looks the same" is easy, but deceptive.

It's much better to show same-sized files and let the viewer assess their quality (this is what the comparison I linked does), but there are deceptive ways to do even this.

See also this great old blog post about doing comparisons for video codecs: https://web.archive.org/web/20141103202912/http://x264dev.mu...


Hm, if you zoom in you can clearly see the difference between jxl and avif. The problem is rather the inconsistency of the results.

For instance, at medium quality Jxl seems to be better at preserving fine details and structure like the mark on the door, the traces on the lower part of the bridge, but avif appears to be better at preserving clarity of complex details, like far-away windows, cars, a tennis racket.


Lossy codecs intentionally allow distortions that are too small to see with the naked eye (without zooming in). They're designed to operate at some "normal" viewing distance. If you zoom in, you defeat that technique.

In case you actually wanted to compress images specifically for viewing when zoomed in, you should use different codecs, or higher quality, or configure codecs differently (e.g. in classic JPEG make quantization table preserve high frequences more).

But for a benchmark that claims to compare codecs in general, you should only use normal viewing distance. Currently it's controversial whether the norm is still ~100dpi or whether it should be the "Retina"/2x resolution, but definitely it's not some zoomed-in 5dpi.


To me, AVIF seems to be better for preserving sharp edges, JPEG XL seems better for preserving textures.


If you used nothing but nails to build various styles of beds, you wouldn't have the same experience sleeping on them as the originals; Unless, of course, the original was a bed of nails.

Sharp edges are just one texture in an infinite range of textures, and AVIF looks like it constructs everything out of sharp edges in a way that's really obvious to the eye at all compression levels.

With JPEG XL I can at least tell what's missing, or too artifacted to make out. With AVIF you have no idea what has been completely erased.


AVIF looks better to me at low bitrates, JXL has much more visible block artifacts:

https://afontenot.github.io/image-formats-comparison/#vid-ga...


I think this is definitely the most common response. AVIF, as a video intra-frame based codec, works best at very low bitrates. JPEG XL is considerably better at high bitrates.


I'm guessing the reason is that for predicting video frames hallucinating detail is undesirable, so you would rather remove detail than add non-existent detail. AVIF also seems to have some kind of deblocking filter which JXL lacks, to my surprise.


You can also adjust that deblocking in AVIF.


AVIF deblocking filter is one axis at a time whereas JPEG XL is doing an axis-non-separable filter, 2d selection at once. It is not clear that AVIF can be parameterised to do similar filtering to JPEG XL -- at least it hasn't been done yet.



This may simply be a case of each codec having its own strengths, however I also wonder whether the issue here isn't also that the "small" compression size you're linking to in these examples in general isn't good enough; you're trading the kinds of artifacts you want - and in general jxl doesn't appear to do as well as avif at low quality settings.

But consider also e.g. https://afontenot.github.io/image-formats-comparison/#reykja... - even at "large" quality settings avif pretty visibly distorts the sky and there's some blur it the nearby trees too.

In any case, comparing to even the fairly good (for jpg) mozjpeg encoder it's clear both of these codes are much better than the status quo, and not that different from each other - neither wins universally vs each other, but both pretty clearly do vs. jpeg.

A fairly simple heuristic seems be that if want images at the tiny size - pick avif. At small, pick avif unless you really, really want to preserve texture over detail. At medium, pick jxl for texture, and avif for detail; and at large, pick jxl.

Browsing through these images in general, I think I'd usually pick jxl at medium or even sometimes large settings; small simply has too many artifacts in general (but if I had to use that - avif), and at better quality I (personally) find the distortion to texture more noticeable than loss of detail. I guess it depends on how important compression ratio is to you?


Photographic JPEG images are used at around 2.0 to 2.5 BPP in the internet, while same for WebP and AVIF is around 1 to 1.7 BPP as the current practice in the internet.

People in the internet don't like to store photos with 0.5 BPP even with the latest and greatest codecs, it gets too blurry and artefacty.

This is not a statement of my personal aesthetic opinion but observing what gets done out in the wild.


Those images are 1.0, 0.57, 0.33, and 0.22 BPP.

Usually we store images at 3.5 - 5 BPP (Cameras), 10+ BPP (Raw or similar for editing) and 1-2 BPP for internet user.

The actual bitrates depend a lot on the image -- graphics with simple backgrounds need less, photos with a lot of sky need less, busy detailed images particularly nature needs more.

While there is one 1.0 in the test, it is for an extremely busy image which would be better stored for internet use at 3+ BPP.

0.22 BPP is almost never used for photographs in the Internet.

See https://almanac.httparchive.org/en/2022/media#fig-15 for median bitrates in 2022 (1.0, 1.4 and 2.1 BPP for lossy formats)


You're probably aware of this, but the comparison tool allows selecting larger images. That first image is 3 bpp if you select "large".

> Usually we store images at 3.5 - 5 BPP (Cameras)

That's appropriate for JPEGs, but given that more recent compression algorithms do a better job, it's probably worth looking at lower bitrates. I pulled some JPEGs off my Canon DSLR for example, and they're around 2-4 bpp for landscape photos.

It's not surprising to see acceptable JPEG XL images with half that bitrate.


Thank you.


Would this be a big challenge to fix in JXL?


much of the gains of AVIF at lowest qualities come from features that don't exist in JPEG XL: wedges, large-support of the blurring filter, directional prediction

these features are non-helpful at normal photography bitrates and only complicate coding at bitrates above 1.5 or so

JPEG XL has similar approaches but its tools have a larger quality operating range

We evaluated these tools for JPEG XL and I rejected them due to them only helping at very lowest bit rates

there are many other ideas on how low quality JPEG XL images could be made, but it seems that it is more of a theoretical question since real use is always relatively high BPP: humans are 1000x more expensive than computers, so human experience can be prioritized over computer working harder for us.


I'm rather sympathetic to the argument of targeting actually used BPPs and think the benchmarks should reflect that (so "low" should be something like covering ~90% of actual images rather than some more arbitrary number), as it's another point of confusion counting against this great new format. Though I miss your last point - how would human experience be harmed by allowing using low BPPs?

> only complicate coding at bitrates above 1.5 or so

these features could be disabled at higher bitrates as they're not helpful there?


I think the point was probably more about complicating the codec by requiring even more features to be implemented, which might hinder adoption


Another practical reason was to keep the level of complexity manageable for us humans implementing and optimizing it.


At the cost of smoothing everything out losing a lot of texture and detail. In some cases it does look better because the artifacts are smoothed over, but it’s not a clear-cut. Especially with photography where the ISO grain is often a part of the art, it’s often completely washed away.


Are you using squoosh.app (That app is crap as it uses default settings optimized for AV1 video?


App? libavif provides avifenc--no need for anything beyond the command line.


On the other hand, AVIF smoothes out the fine details. It struggles to preserve the texture of the receipt and the tiny specks on the inside of the cup blend together.

It's visible even in qualities higher than "tiny", which IMO is unreasonably small.


For me coffee details look better on AVIF, spoon and receipt do lose some texture

Frankly I can't really see much difference between AV1/Tiny and JPEGXL/Large on the [1] middle of the coffee one, everything else around (cup, receipt, spoon) clearly have more detail but not the coffee in the middle

- [1] https://imgur.com/a/PeHlyE4


The sweet spot for jxl is generally in the more "normal" bitrates


Very interesting, thanks for building/sharing (I imagine you are the author of this page?). Questions I would have: a. Is it possible to dig out the exact cli command used for each of this encodings? b. With libavif, did you test film-grain-test and what are your findings if any? c. With libavif, did you test denoise-noise-level and what are your findings if any? Of course I have no wish to put you into any homework mode, so feel free to ignore this.


I am not the author.


That's a very photo heavy corpus. Which is fine if you're evaluating these formats for photo compression, but not very useful if you have a mix of photos, comics, memes, screenshots, etc.


I worked for a storied Japanese imaging corporation.

The final stage of their Image Quality QC was always "The Old Men With Good Eyes."

There were some employees that were professional "eyballers." They had full veto power.

I'm not sure if they still do that, but it would not surprise me, if they did.


When you're designing a codec or some image processing pipeline, having a human in the loop is great. Designing codecs to have human-acceptable artefacts is as much art as science. Otherwise you end up like Xerox that had scanners which tended to compress "6" as "8", since they looked close-enough to their compression algorithm[1]

[1]: http://dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_...


That is also how JPEG XL quality was decided. I viewed every quality affecting change manually and if it didn't pass, it didn't pass - no matter what objective metrics said.


Awesome!

Thanks for your good work!

What I like about JPEG XL, is the alpha-channel support.


Yeah, whenever I see the image comparison. I more often then not find the result of webp or avif undesirable even against a good jpeg compression. webp and avif tends to blur out detailed textures too much and just looks worse than jpeg which while looks a bit rough and has some artifact, but you can still somewhat see the detailed textures.


Interestingly in the latest FF on Linux for some reason for me the colors are washed out for AVIF and JPEGXL but not the other formats. Looking fine in Chromium though.

When zooming in 3x I can say that AVIF looks significantly better than JPEGXL in the lower quality settings while being consistently a little bit smaller in filesize. There is better color retention, less noise and looks sharper. At the "tiny" setting the difference is night and day. At the highest setting they are so close enough that I wouldn't say there's a clear winner for me after going through all the images.


On the “Vid Gajsek” image posted elsewhere in the thread, AVIF consistently removes texture in the shadow between the receipt and the cup, even at size “large”. At size “medium”, it does so across the entire cup.

Disclaimer: JPEG XL contributor.


I am sure there are examples where JPEG XL outperforms AVIF. But looking through all those images in the test sample I get the impression that AVIF on average in the lowest quality setting is significantly better than JPEG XL.

I agree that the example you presented indeed has the mentioned shadow issue in AVIF.


If tiny quality was 1.0 bpp, low quality 1.5 bpp, and medium quality 2.5 bpp and high quality 4.0 bpp, JPEG XL would win in all categories.

In reality cameras produce about 3.5-4.5 bpp, and the internet uses an average of 2 bpp for photographs.


Realistically AVIF would mostly be used to reduce file size relative to JPEG, not increase quality.


It does but the middle bit looks far better, AV1/tiny is comparable to JXL/large when only looking at middle bit, weirdly enough, everything around is of course worse

https://imgur.com/a/PeHlyE4


Agreed, take the Steinway image for example, jpegxl makes a mess of it with the tiny setting, lots of artifacts much detail lost (e.g. the benches on the right, or the piano keys on the left).


> Objective metrics (like dssim) are not the best way to compare image codecs.

Yep. It is ultimately a subjective experience. I found this out when implementing a color management system back in the early to mid 90s.

There is a lot of math, and a lot of physics, of the light, of boundary conditions of ink, etc. But there is also a lot of perception and psychology that is very difficult to capture.

As an example, my first attempts were quite accurate as measured, but looked horribly yellow-ish. The reason is that my software was successfully compensating for the blue-ish optical whiteners (UV+blue) in the paper. Which you can't do, because the eye will judge the brightest "nearly white" area in the field of view as white, and judge all the other colors relative to that.

But you also can't not do it, because then you ignore what the colors are actually supposed to be. And then it gets tricky...


I looked through quite a few of these test images and it seems that AFIV is better overall than JPEG XL.

Sometimes it's really clear. E.g. for the "US Open" Image, AVIF "tiny" (23.6 KB) looks as good as JPEG XL "medium" (46.3 KB), despite the substantial difference in bit rate:

https://afontenot.github.io/image-formats-comparison/#us-ope...

In some other cases it's less clear, and sometimes AVIF denoises too aggressively, e.g. in animal fur. JPEG XL has more problems with color bleed. Overall it seems to me AVIF is significantly better, especially on lower bit rates.


I agree that subjective (human) testing is better than comparing using metrics, but the downside is that you can only look at so many images, and it depends quite a lot on the image content how the various codecs perform.

When doing a visual comparison, imo the best way is to start with the original versus a codec, to find out what bitrate you consider "good enough" for that image — this can vary wildly between images (on some images "small" will be fine while on others even "large" is not quite good enough). Then compare the various codecs at that bitrate.

There's a temptation to compare things at low qualities (e.g. "tiny") because there it's of course easier to see the artifacts. You cannot extrapolate codec performance at low quality settings to high quality though, e.g. it's not because AVIF looks better than JXL at "tiny" that it also looks better at "large". So if you want to do a meaningful comparison, it's best to compare at the quality you actually want to use.

At Cloudinary we did a large subjective study on 250 different images, at the quality range we consider relevant for web delivery (medium quality to near-visually lossless). We collected 1.4 million opinions via crowdsourcing in order to get accurate mean opinion scores. The results are available at https://cloudinary.com/labs/cid22.

One important thing to notice is that codec performance depends not only on the codec itself but also on the encoder and the encoder settings that are used. If you spend more time on encoding, you can get better results. A fair comparison is one that uses the best available encoders, at similar (and relevant) bitrates, and at similar (and relevant) encode speeds.

It's almost impossible to do subjective evaluation for all possible encoder settings though, or to redo the evaluations each time a new encoder version is released. This is why objective metrics are useful. There are many metrics, and some are better than others. You can measure how good a metric is by measuring how well it correlates with subjective results. According to our experiments, currently the best metrics are SSIMULACRA 2, Butteraugli 3-norm, and DSSIM. Older metrics like PSNR, SSIM, or even VMAF do not perform that well — probably indeed partially because some encoders are optimizing for them.

Here are some aggregated interactive plots that show both compression gains (percentage saved over unoptimized JPEG, at a given metric score) and encode speed (megapixels per second):

SSIMULACRA 2: https://sneyers.info/benchmarks/tradeoff-relative-SSIMULACRA...

Butteraugli: https://sneyers.info/benchmarks/tradeoff-relative-Butteraugl...

DSSIM: https://sneyers.info/benchmarks/tradeoff-relative-DSSIM.html

(Note that Butteraugli and DSSIM are distortion metrics, not quality metrics, so "less is better")


True, but at least dssim is not the worst one to use.


Worth mentioning that the reference encoder only targets Butteraugli on Effort=8 and Effort=9, for effort=7 and below, it uses heuristics instead.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: