In prior discussions about JPEG XL on here, some didn't realize that modern browsers already let you pick the "best fit" image from a set of images in basic HTML.
So if a site decided to support JPEG XL (including recompressing their existing JPEG library, going forward additionally high-quality encoding to better codecs for new images) they can easily support it in a way that lets the browser choose. No weird reverse proxies or machinations or polyfills to support it.
Another great aspect of JPEG XL is that it is a short circuit answer to almost any "what format should you store an image in?" question. Instead of the classic "is it comic book style? GIF or PNG. Photorealistic? JPEG. Need transparency....oof, is it photorealistic, because then things get ugly....", with JPEG XL it truly becomes the universal answer.
At least outside of absurdly high compression rates on photorealistic content where it does suffer.
> Unlike <img> tags, <video> posters have no standard way of specifying many URLs for fetching the optimal format supported by the client.
Actually, you can do it without needing any scripting, though the technique is convoluted: use as the poster an SVG image (encoded inline as a data URI), containing a <foreignObject>, containing an HTML <picture> element. It ends up roughly this (with superfluous whitespace added):
> WebP seems to have about 10% better compression compared to libjpeg in most cases, except with 1500px images where the compression is about equal.[1]
Let's be fair, WebP lossless is pretty good, it has a couple advantages:
1. Almost always produces smaller files than PNG, even after PNG optimizers
2. Supported by all web browsers.
WebP lossless' disadvantage comes in that the maximum dimensions of the file are limited to 16383×16383. I find images larger than that on a frequent basis, it's not a very high limit. It also only supports 8-bit per color images. Good enough for screenshots, not good enough for some editors.
Lossless is a category of codecs where file size is almost the only factor anyone cares about. Pretty much all of them are fast enough to not bother measuring with encode or decode speeds.
JPEG XL will perform even better in lossless mode than WebP lossless does, but it's not currently supported by any browser (it's in Safari beta and behind a config flag in Firefox Nightly).
>but it's not currently supported by any browser (it's in Safari beta and behind a config flag in Firefox Nightly)
Your comment is 100% accurate, but it's notable that JPEG XL is coming this fall across the release versions of macOS, iOS, iPadOS, and then standalone Safari (itself being compatible back to macOS 12). Not just in Safari, but in the core media content subsystem of the core OS as well, meaning it can be immediately supported by third party apps as well effortlessly.
Given the extremely rapid uptake of Apple OS updates, by year end it will have a substantial compatible base.
Of course webp has a massive compatible base right now yet sees almost no adoption. Hopefully the quality and functional benefits of JPEG XL finally get us over that hump.
You could see another comment I made on this story. :) WebP lossy has poor enough visual quality and poor enough file size gains over JPEG (especially after MozJPEG has really squeezed the most they could out of the 1993 format), that it's wholly unappealing to anyone that would possibly care about lossy formats. Google never really advertised that WebP lossless is a thing, so in the minds of many, WebP is a failed JPEG replacement and PNG stands unopposed (even though that's untrue for a subset of images PNG supports).
Here's a question I have for when e.g. Chrome supports JPEG XL: currently, if you have a <picture> with avif, webp, jpeg, it prioritises them in that order. But with JPEG XL added into the mix, which one is chosen by the browser? AVIF or JPEG XL? Will different browsers make different choices of priority?
And finally, how much longer will the HTTP "Accepts" header continue to get with these new formats?
No, seriously, there’s not generally much point in supporting more than the latest fairly widely-supported format, and a universal fallback format. The advantages of the intermediate format are typically too slight and too transient.
If, for example, you deal with photos and currently serve AVIF, WebP and JPEG, around 80% of viewers will take AVIF despite the implementations all being under three years old (less than one in Apple-land), and that number will continue to climb, so that the WebP in this chain will be helping practically no one within two years. Moreover, adding WebP is already imposing a tiny cost for every viewer and a likely-noticeable cost for storage and maintenance, while frankly only slightly reducing the cost over JPEG for WebP-but-not-AVIF viewers.
It is not that clear. AVIF (and WebP) seems to be performing worse than JPEG at qualities above 90. That is likely 30-40 % of all the images in the internet. Why would these users want to send more bytes of AVIF than they use for JPEG today to maintain their quality choice? They would also lose lightweight decoding and progressive viewing while doing that.
That’s completely irrelevant to what we were talking about. We were discussing whether, if providing JPEG XL and JPEG, there would be any value in also providing AVIF or WebP.
> … qualities above 90. That is likely 30-40 % of all the images in the internet.
I don’t know whether it is, but it shouldn’t be anything like that. For browsing, a JPEG of quality q=90 is ridiculous overkill: that’s the kind of quality you should only get if you’re deliberately downloading high-quality images. A more commonly used figure is q=75, which produces files around 40% of the size of q=90, and most of the time q=60 (around ¼ of the size of q=90) is entirely adequate (perceptually sufficiently indistinct).
I would also expect that such excessive-quality images would be found primarily in systems that don’t support multi-format serving.
I should clarify that when I say “a more commonly used figure”, I mean “among things that have put any consideration into optimisation”. Where not controlled deliberately, tools tend to use the quality of the source image (which is probably around q=90 to q=94), or choose an unnecessarily high value like q=90. But take tools that have put at least some effort into sanity, and you find things like: https://squoosh.app/, a human-friendly tool for manual image optimisation and conversions, defaults to q=75 on JPEG; and the Zola static site generator defaults to q=75 for JPEG <https://www.getzola.org/documentation/content/image-processi...>; and Sharp, a Node.js library used by eleventy-image, defaults to q=80 on JPEG <https://sharp.pixelplumbing.com/api-output#jpeg>.
The considerable majority of images on the web are way higher-quality than they need to be.
I like the images to be at or above quality 94, d1.0 or smaller in jpeg xl. Some other user cares less. Some user thinks that quality 75 is as they came from the camera and it cannot be helped.
Objective metrics (like dssim) are not the best way to compare image codecs. This is in part for the obvious reason - that none of them are perfect and the human eye is better - but also for other reasons, including that some codecs use metrics internally for bitrate targeting, and these metrics can disagree with the metrics used by other codecs. Without a visual comparison, it's impossible to say which metrics are better for a given set of images.
JPEG XL was based (in part) on Google's "pik" codec, which used the Butteraugli metric for bitrate targeting.
Objective metrics have many weaknesses and pitfalls, but keep in mind that just eyeballing images is not necessarily better.
Non-blind n=1 human test can be just as flawed. What you like in a particular scenario is not representative of codecs' overall performance.
Testing with humans requires proper setup and a large sample size (which BTW JPEG XL has done!)
The problem is that these codecs are close enough in performance to be below "noise" level of human judgement.
You will not be able to reliably distinguish q=80 vs q=81 of the same codec, even though there is objectively a difference between them. And you can't lower quality to potato level to make your job easier — that changes the nature of the benchmark.
People also just differ in opinion whether blurry without detail is better than detailed but with blocking or ringing. If you ask people to rate images, stddev of their scores will be pretty large, so wide that scores of objective metrics can fit in 1-sigma.
People also tend to pick "nicer" image, rather than the one that is closer to the original. That's a bit different task than codecs aim for.
Codecs can allocate more or less of the file to color channels, so you can get different conclusions based on e.g. amount of bright red in the image.
So testing is hard. Plenty of pitfalls. Showing a smaller file that "looks the same" is easy, but deceptive.
> People also tend to pick "nicer" image, rather than the one that is closer to the original. That's a bit different task than codecs aim for.
Samsung turns their screens and phone cameras to “vivid” processing by default (they do provide a “natural” toggle, to their credit).
I think over 90% of people don’t realize what they are seeing is not close to reality, and instead ask people with other phones, especially iPhones, why their screen or photo looks so washed out.
There’s many faults to BFDLs, but I love Apple providing “vivid” processing as an option and “natural” as the default. Sometimes the masses just don’t know what’s actually good for them.
The question is photos is whether people want something close to reality or would they rather have an exaggerated, arguably more beautiful version of reality to remember and share? I guess unless you're doing journalism, arguing for less processed photos may be a moot point, at least in phone cameras, as those are likely overwhelmingly often used for snapshots.
Agreed that large double-blinded surveys with trained participants is pretty much the gold standard, and that there's no true objectivity about e.g. blurring vs blocking.
> Showing a smaller file that "looks the same" is easy, but deceptive.
It's much better to show same-sized files and let the viewer assess their quality (this is what the comparison I linked does), but there are deceptive ways to do even this.
Hm, if you zoom in you can clearly see the difference between jxl and avif. The problem is rather the inconsistency of the results.
For instance, at medium quality Jxl seems to be better at preserving fine details and structure like the mark on the door, the traces on the lower part of the bridge, but avif appears to be better at preserving clarity of complex details, like far-away windows, cars, a tennis racket.
Lossy codecs intentionally allow distortions that are too small to see with the naked eye (without zooming in). They're designed to operate at some "normal" viewing distance. If you zoom in, you defeat that technique.
In case you actually wanted to compress images specifically for viewing when zoomed in, you should use different codecs, or higher quality, or configure codecs differently (e.g. in classic JPEG make quantization table preserve high frequences more).
But for a benchmark that claims to compare codecs in general, you should only use normal viewing distance. Currently it's controversial whether the norm is still ~100dpi or whether it should be the "Retina"/2x resolution, but definitely it's not some zoomed-in 5dpi.
If you used nothing but nails to build various styles of beds, you wouldn't have the same experience sleeping on them as the originals; Unless, of course, the original was a bed of nails.
Sharp edges are just one texture in an infinite range of textures, and AVIF looks like it constructs everything out of sharp edges in a way that's really obvious to the eye at all compression levels.
With JPEG XL I can at least tell what's missing, or too artifacted to make out. With AVIF you have no idea what has been completely erased.
I think this is definitely the most common response. AVIF, as a video intra-frame based codec, works best at very low bitrates. JPEG XL is considerably better at high bitrates.
I'm guessing the reason is that for predicting video frames hallucinating detail is undesirable, so you would rather remove detail than add non-existent detail. AVIF also seems to have some kind of deblocking filter which JXL lacks, to my surprise.
AVIF deblocking filter is one axis at a time whereas JPEG XL is doing an axis-non-separable filter, 2d selection at once. It is not clear that AVIF can be parameterised to do similar filtering to JPEG XL -- at least it hasn't been done yet.
This may simply be a case of each codec having its own strengths, however I also wonder whether the issue here isn't also that the "small" compression size you're linking to in these examples in general isn't good enough; you're trading the kinds of artifacts you want - and in general jxl doesn't appear to do as well as avif at low quality settings.
In any case, comparing to even the fairly good (for jpg) mozjpeg encoder it's clear both of these codes are much better than the status quo, and not that different from each other - neither wins universally vs each other, but both pretty clearly do vs. jpeg.
A fairly simple heuristic seems be that if want images at the tiny size - pick avif. At small, pick avif unless you really, really want to preserve texture over detail. At medium, pick jxl for texture, and avif for detail; and at large, pick jxl.
Browsing through these images in general, I think I'd usually pick jxl at medium or even sometimes large settings; small simply has too many artifacts in general (but if I had to use that - avif), and at better quality I (personally) find the distortion to texture more noticeable than loss of detail. I guess it depends on how important compression ratio is to you?
Photographic JPEG images are used at around 2.0 to 2.5 BPP in the internet, while same for WebP and AVIF is around 1 to 1.7 BPP as the current practice in the internet.
People in the internet don't like to store photos with 0.5 BPP even with the latest and greatest codecs, it gets too blurry and artefacty.
This is not a statement of my personal aesthetic opinion but observing what gets done out in the wild.
Usually we store images at 3.5 - 5 BPP (Cameras), 10+ BPP (Raw or similar for editing) and 1-2 BPP for internet user.
The actual bitrates depend a lot on the image -- graphics with simple backgrounds need less, photos with a lot of sky need less, busy detailed images particularly nature needs more.
While there is one 1.0 in the test, it is for an extremely busy image which would be better stored for internet use at 3+ BPP.
0.22 BPP is almost never used for photographs in the Internet.
You're probably aware of this, but the comparison tool allows selecting larger images. That first image is 3 bpp if you select "large".
> Usually we store images at 3.5 - 5 BPP (Cameras)
That's appropriate for JPEGs, but given that more recent compression algorithms do a better job, it's probably worth looking at lower bitrates. I pulled some JPEGs off my Canon DSLR for example, and they're around 2-4 bpp for landscape photos.
It's not surprising to see acceptable JPEG XL images with half that bitrate.
much of the gains of AVIF at lowest qualities come from features that don't exist in JPEG XL: wedges, large-support of the blurring filter, directional prediction
these features are non-helpful at normal photography bitrates and only complicate coding at bitrates above 1.5 or so
JPEG XL has similar approaches but its tools have a larger quality operating range
We evaluated these tools for JPEG XL and I rejected them due to them only helping at very lowest bit rates
there are many other ideas on how low quality JPEG XL images could be made, but it seems that it is more of a theoretical question since real use is always relatively high BPP: humans are 1000x more expensive than computers, so human experience can be prioritized over computer working harder for us.
I'm rather sympathetic to the argument of targeting actually used BPPs and think the benchmarks should reflect that (so "low" should be something like covering ~90% of actual images rather than some more arbitrary number), as it's another point of confusion counting against this great new format.
Though I miss your last point - how would human experience be harmed by allowing using low BPPs?
> only complicate coding at bitrates above 1.5 or so
these features could be disabled at higher bitrates as they're not helpful there?
At the cost of smoothing everything out losing a lot of texture and detail. In some cases it does look better because the artifacts are smoothed over, but it’s not a clear-cut. Especially with photography where the ISO grain is often a part of the art, it’s often completely washed away.
On the other hand, AVIF smoothes out the fine details. It struggles to preserve the texture of the receipt and the tiny specks on the inside of the cup blend together.
It's visible even in qualities higher than "tiny", which IMO is unreasonably small.
For me coffee details look better on AVIF, spoon and receipt do lose some texture
Frankly I can't really see much difference between AV1/Tiny and JPEGXL/Large on the [1] middle of the coffee one, everything else around (cup, receipt, spoon) clearly have more detail but not the coffee in the middle
Very interesting, thanks for building/sharing (I imagine you are the author of this page?). Questions I would have:
a. Is it possible to dig out the exact cli command used for each of this encodings?
b. With libavif, did you test film-grain-test and what are your findings if any?
c. With libavif, did you test denoise-noise-level and what are your findings if any?
Of course I have no wish to put you into any homework mode, so feel free to ignore this.
That's a very photo heavy corpus. Which is fine if you're evaluating these formats for photo compression, but not very useful if you have a mix of photos, comics, memes, screenshots, etc.
When you're designing a codec or some image processing pipeline, having a human in the loop is great. Designing codecs to have human-acceptable artefacts is as much art as science. Otherwise you end up like Xerox that had scanners which tended to compress "6" as "8", since they looked close-enough to their compression algorithm[1]
That is also how JPEG XL quality was decided. I viewed every quality affecting change manually and if it didn't pass, it didn't pass - no matter what objective metrics said.
Yeah, whenever I see the image comparison. I more often then not find the result of webp or avif undesirable even against a good jpeg compression. webp and avif tends to blur out detailed textures too much and just looks worse than jpeg which while looks a bit rough and has some artifact, but you can still somewhat see the detailed textures.
Interestingly in the latest FF on Linux for some reason for me the colors are washed out for AVIF and JPEGXL but not the other formats. Looking fine in Chromium though.
When zooming in 3x I can say that AVIF looks significantly better than JPEGXL in the lower quality settings while being consistently a little bit smaller in filesize. There is better color retention, less noise and looks sharper. At the "tiny" setting the difference is night and day. At the highest setting they are so close enough that I wouldn't say there's a clear winner for me after going through all the images.
On the “Vid Gajsek” image posted elsewhere in the thread, AVIF consistently removes texture in the shadow between the receipt and the cup, even at size “large”. At size “medium”, it does so across the entire cup.
I am sure there are examples where JPEG XL outperforms AVIF. But looking through all those images in the test sample I get the impression that AVIF on average in the lowest quality setting is significantly better than JPEG XL.
I agree that the example you presented indeed has the mentioned shadow issue in AVIF.
It does but the middle bit looks far better, AV1/tiny is comparable to JXL/large when only looking at middle bit, weirdly enough, everything around is of course worse
Agreed, take the Steinway image for example, jpegxl makes a mess of it with the tiny setting, lots of artifacts much detail lost (e.g. the benches on the right, or the piano keys on the left).
> Objective metrics (like dssim) are not the best way to compare image codecs.
Yep. It is ultimately a subjective experience. I found this out when implementing a color management system back in the early to mid 90s.
There is a lot of math, and a lot of physics, of the light, of boundary conditions of ink, etc. But there is also a lot of perception and psychology that is very difficult to capture.
As an example, my first attempts were quite accurate as measured, but looked horribly yellow-ish. The reason is that my software was successfully compensating for the blue-ish optical whiteners (UV+blue) in the paper. Which you can't do, because the eye will judge the brightest "nearly white" area in the field of view as white, and judge all the other colors relative to that.
But you also can't not do it, because then you ignore what the colors are actually supposed to be. And then it gets tricky...
I looked through quite a few of these test images and it seems that AFIV is better overall than JPEG XL.
Sometimes it's really clear. E.g. for the "US Open" Image, AVIF "tiny" (23.6 KB) looks as good as JPEG XL "medium" (46.3 KB), despite the substantial difference in bit rate:
In some other cases it's less clear, and sometimes AVIF denoises too aggressively, e.g. in animal fur. JPEG XL has more problems with color bleed. Overall it seems to me AVIF is significantly better, especially on lower bit rates.
I agree that subjective (human) testing is better than comparing using metrics, but the downside is that you can only look at so many images, and it depends quite a lot on the image content how the various codecs perform.
When doing a visual comparison, imo the best way is to start with the original versus a codec, to find out what bitrate you consider "good enough" for that image — this can vary wildly between images (on some images "small" will be fine while on others even "large" is not quite good enough). Then compare the various codecs at that bitrate.
There's a temptation to compare things at low qualities (e.g. "tiny") because there it's of course easier to see the artifacts. You cannot extrapolate codec performance at low quality settings to high quality though, e.g. it's not because AVIF looks better than JXL at "tiny" that it also looks better at "large". So if you want to do a meaningful comparison, it's best to compare at the quality you actually want to use.
At Cloudinary we did a large subjective study on 250 different images, at the quality range we consider relevant for web delivery (medium quality to near-visually lossless). We collected 1.4 million opinions via crowdsourcing in order to get accurate mean opinion scores. The results are available at https://cloudinary.com/labs/cid22.
One important thing to notice is that codec performance depends not only on the codec itself but also on the encoder and the encoder settings that are used. If you spend more time on encoding, you can get better results. A fair comparison is one that uses the best available encoders, at similar (and relevant) bitrates, and at similar (and relevant) encode speeds.
It's almost impossible to do subjective evaluation for all possible encoder settings though, or to redo the evaluations each time a new encoder version is released. This is why objective metrics are useful. There are many metrics, and some are better than others. You can measure how good a metric is by measuring how well it correlates with subjective results. According to our experiments, currently the best metrics are SSIMULACRA 2, Butteraugli 3-norm, and DSSIM. Older metrics like PSNR, SSIM, or even VMAF do not perform that well — probably indeed partially because some encoders are optimizing for them.
Here are some aggregated interactive plots that show both compression gains (percentage saved over unoptimized JPEG, at a given metric score) and encode speed (megapixels per second):
For me, the most interesting property of JPEG XL is that it allows lossless recompression of JPEG images. This seems a great way to save storage in a photo library without any loss of quality.
While I would prioritize other factors in a next-gen image format, the fact that JXL manages to lead in all those plus the ability to “upgrade library” is an incredible achievement.
Visual quality, file size, compression time, decompression time, browser support, toolchain support, patent restrictions. Considering any one of those in isolation is insufficient for evaluating image formats.
> Visual quality, file size, compression time, decompression time, browser support, toolchain support, patent restrictions. Considering any one of those in isolation is insufficient for evaluating image formats.
Additionally, use cases are critical for understanding which of these and other properties matter, as well as their relative importance. Authoring formats (e.g. Apple ProRes, DNG) are evaluated much differently than distribution formats (e.g. AV1, JPEG).
And not to forget the third category: storage/archival formats. JPEG XL is unlikely to get widespread browser support, but that doesn't mean it isn't a good format to store high qualitiy originals. Either because they never touch a browser (my image viewer supports JXL) or because all images go through the resizing proxy anyways.
There's a neat table on https://jpegxl.info/ which shows all features of both formats. In general JXL has more features than AVIF, although some of them may be more important for special purposes like medical imaging.
Very good and informative table indeed, great base from which to start the codec comparison, pity it's an image and not fully interactive with extra info
You can pick and choose which facts to present or misrepresent which/what is important but... that's not going on here. JXL is actually just better by very many metrics. It's only lacking in adoption. There's not something that jpeg-xl.info is ignoring or glossing over. This is just lazy cynicism.
I was mostly joking, but you can compare a skateboard and a car and make the skateboard look better if you just mention weight and pollution but not top speed, autonomy, etc.
Well, ok, that's a fair point: from this table alone you can't say anything about computational complexity or implementation difficulty. And I cannot honestly say anything about that. I agree that it's a valid question to ask.
I'm afraid that your first reply misfired a bit if that was what you wanted to get across though.
Also 16-bit colors support! The fact that AVIF does not have it is a bummer (although, understandable given the fact AVIF is for a video codec, not for editing HDR images).
Honestly that plus the awful tiling behavior at relatively low resolutions should have been enough to disqualify it from being mangled into something resembling an image format. It was made and tuned for movies, not stills. But its image format gets to inherit all of the drawbacks of that initial use case, including awful encode/decode performance.
AVIF supports "8-, 10-, and 12-bit color depths". It does not support 16-bit color depths (which is my claim made in the comment above). See https://en.wikipedia.org/wiki/AVIF
The need for 16-bit colors arises when HDR content is edited, but unnecessary for just playback. AVIF is a playback format, so they didn't feel the need to support 16-bit colors, I guess.
I think a bigger issue is that it apparently doesn't support images exceeding just 9 megapixels, making it unsuitable even for digital photography today. There is a technique of stitching multiple images together, but I'm not sure what other drawbacks this has.
Another problem is that it doesn't support progressive loading in the browser. Meaning the AVIF image only shows when it is fully loaded:
That looks really bad for the web and could make AVIF subjectively slower than most other image formats.
It's disappointing. There is a once-in-multiple-decades chance to establish a big new image format -- and then it doesn't even support some basic features JPEG had decades ago. If AVIF wins, we probably would have to live with its problems for ages. It reminds of JPEG not supporting an alpha channel. 30 years later this blunder still causes headaches.
So yeah, AVIF seems to have better compression than JPEG XL, if the subjective examples are reliable, but at least the latter doesn't miss basic features, which seems more important in the long run. Alas, Google/Chrome favors AVIF, so JPEG XL seems doomed.
AVIF has progressive support in the spec, but it has not yet been implemented in an encoder and it is very likely going to be less dense than the non-progressive coding. My guesswork is that it will be ~10 % less dense than non-progressive once implemented, but it is just a guess with my pass experiences of other codec experiments trying to do such things.
Otherwise both formats have all features. JPEG XL is yuv444 by default whereas AVIF tends to favor yuv420 and some encoding/decoding hardware is going to be yuv420 only, so with AVIF one would often need to go software coding to get yuv444 quality.
- They would suffer from the same adoption difficulty hardware video encoding suffers from.
- They would suffer from the same quality issues and format limitations compared to software encoders (where intel/nvidia dGPU hardware AV1 maxes out at ~x265 medium last I checked)
Still it would be interesting to see it implemented. Sure it might be more limited in features compared to software encoders, but at least it could be an option when it's acceptable with software encoders being another one when more features are needed.
Adoption time is applicable to anything new(ish), so AV1 isn't really special in that sense. It's already close to reaching wide adoption in hardware, so it's progressing well.
HEVC wasn't ubiquitous right away either. But it's DOA because of patent trolls like MPEG-LA. AV1 came later, so it will take its time to become ubiquitous while avoiding issues of HEVC.
Apple is a brake on the progress, as always. They'll arrive last.
Comes with substantial compute costs at scale… know of companies where the cost of supporting AVIF is many millions more than the cost of supporting JXL
One of the biggest things I'm interested in is encode times. I have a static site where I automatically optimize and create
a webp version of every image at build time and it's always very fast. When I tried adding avif into the mix as well, build times absolutely skyrocketed. Obviously I was able to resolve that problem with caching, but it just all becomes much more of a pain to manage.
Note that AVIF is badly broken on the stable version of Gimp (2.10.34). Chroma is upscaled using NEAREST NEIGHBOR. Gimp does not do that when you load a JPEG or other lossy file which uses chroma subsampling.
Sadly, upsampling algorithm in AVIF has been explicitly left unspecified, because the video industry doesn't see a problem with that.
IMO using chroma subsampling with AVIF is always an error. That mode shouldn't exist, and it's there only due to video legacy. Just always use full-res color. AV1 has chroma-from-luma feature to deal with full-res chroma efficiently. If you destroy color resolution like it's 1953, you're just degrading both the quality and the compression for no good reason.
I tried to keep it away from pik (successfully) and jpeg xl (with moderate success), but in the end we needed to add yuv420/yuv422 for old school jpeg recompression (75 % of old jpegs are in this unfortunate mode).
Personally, I've seen JPEG XL source code and I'd rather not have it in my production, unless it's rewritten or at least security audited / Google fuzz-tested. It's a bit of a mess (the last time I checked was 6 months ago)
There are some valid critiques, like dssim being an old and questionable metric, and AVIF using unoptimized settings, and maybe JXL having an "unfair" advantage depending on how the source images (jpegs?) were ingested.
But this is a real world benchmark. If this is an example of how a seasoned backend dev would typically set up AVIF and JPEG XL, its extremely telling.
There are millions of old devices that are never going to support new image formats. Overall, the benefits of the new formats don't overweight the negative aspects yet.
Not surprising, it's hard to compare image formats, it takes a lot of sample images before you can say anything, and even then it's hard to rule out sampling bias
The trouble is that images are a large fraction of the page size even for well optimised websites. Using smaller images via changing format often makes a significant difference in the page load time.
It's also relatively easy for new formats to be adopted - the html picture tag makes it really easy to serve the same image in multiple formats to allow for backward compatibility.
And of course image decoders are easily distributed - JXL's encoder/decoder are open source.
The combination of those two factors probably means that this will never be a settled area now. It's fairly easy for websites to adopt a new format and there are strong reasons to. Means, motive and opportunity.
I reckon Chrome will cave in eventually. The pressure on browser performance is intense. Unless they want to see Safari faster in benchmarks of image heavy pages they will have to adopt it.
> The trouble is that images are a large fraction of the page size even for well optimised websites. Using smaller images via changing format often makes a significant difference in the page load time.
That's true, but even savvy people don't seem to care much about this. For example, (open) podcasts are still universally MP3-based even though it's been possible for many years to halve their size using modern, ubiquitous audio formats.
> I reckon Chrome will cave in eventually. The pressure on browser performance is intense. Unless they want to see Safari faster in benchmarks of image heavy pages they will have to adopt it.
What's curious about this to me is why Apple is single-handedly saving JPEG-XL from obscurity when AV1 is also a fine substitute for mainstream image compression use cases.
MP3 is also good enough. Modern compressors are on par with Opus, AAC or whatever is in fashion now. The format is sane, patents have expired, it run everywhere.
Opus you'd get away with half the bitrate, or less, than MP3 for a podcast. Gets to be somewhat significant given the length of many podcasts, those 64kbps or so that you save add up over time.
Though I agree, the "works everywhere" aspect is really the overriding factor: everyone can play it, all the services will accept it, etc. I think the only way you'd see it get used significantly is if a major service was transcoding to it, but I don't know that podcast bandwidth is a sufficient cost for anyone to bother.
HE-AAC is a bit better than Opus, plus has the benefit of MP3's "works everywhere" experience. I posted more detail elsewhere in the thread if you're interested.
xHE-AAC from 2016 (also known as USAC) yes. The older HE-AAC from 2003 and HE-AACv2 are not. Codecs have similar names, but they are different and released at different times.
Note that AAC (presumably they mean "Main Profile" rather than AAC-LC) has effectively the same efficiency as Opus. HE-AAC and HE-AACv2 have a higher efficiency than both Opus and AAC, and works great at lower bitrates in comparison to AAC.
Note that AAC (presumably they mean "Main Profile" rather than AAC-LC) has effectively the same efficiency as Opus. HE-AAC and HE-AACv2 have a higher efficiency than both Opus and AAC, and works great at lower bitrates in comparison to AAC."
This chart just roughly outlines (according to the feeling of Opus developers at that time) what to expect from Opus - a wide range of useful bitrates. It's not anything that was actually measured or something that can be used for drawing any conclusions from it. I mean - those nice curves and lack of any detail about the codecs used should give it away.
According to public (double blind) listening test that were performed by the Hydrogen audio group Opus does win over best HE-AAC codecs available at time when the test was performed - both at 64kbps and 96kbps bitrates [1] (Multiformat Tests).
The podcast (and audio in general) thing annoys me way more than it should. FFS, just use Opus, it is supported on virtually all devices out there and is massively smaller
My guess is the fact that existing jpeg images can be transcoded to jpeg-xl losslessly is the driving feature. This gives an easy migration path for existing sites that don't have high resolution masters that can be re-encoded into AV1 or webp.
Also think of 20+ years of jpg digital camera photos that can now be transcoded to save space. For Apple that is also a huge win.
> What's curious about this to me is why Apple is single-handedly saving JPEG-XL from obscurity when AV1 is also a fine substitute for mainstream image compression use cases.
Apple does whatever they want. If they think something is better, they will go ahead and use it, the rest of the market be damned.
> What format would you recommend? It would need to be well-supported by podcast players.
HE-AAC. Support for HE-AAC and HE-AAC v2 has been universal on modern media platforms and operating systems for well over a decade.¹
• All versions of Android support HE-AAC v2 playback²
- Google also added encoders in Android 4.1 (2012)
• iOS introduced support for HE-AAC v2 playback in iOS 4 (released 2010)
• macOS introduced support for HE-AAC v2 playback with iTunes 9.2 (released 2010)
• I'm not sure when Windows added support, but it was available in Windows 8
• All open source players support HE-AAC v2 playback via FAAD2
FWIW, I distributed a reasonably-popular podcast (1.2M downloads over its lifetime) using HE-AAC v2 several years ago, and never received a complaint, or found a player it didn't work on.
¹ I read the other comments recommending Opus before responding, and although Opus is very nice, it's not as efficient or as ubiquitous as HE-AAC.
FAAD2 is in non-free repositories; for open source, it presents a problem with distribution outside of freeworld.
Neither Opus nor mp3 have this problem. So to maximize compatibility, mp3 is still the best choice, due to the attitude to Opus and other free codecs that Apple has.
> FAAD2 is in non-free repositories; for open source, it presents a problem with distribution outside of freeworld.
I may be misspeaking about FAAD2, but I've never run into an open-source player (like VLC) or library (like ffmpeg) which hasn't supported at least HE-AAC decode for a decade or more. If that's wrong, I'd love to be corrected in the most detailed way possible.
FFmpeg is exactly the kind of application, that in non-pruned configuration (i.e. with codecs like AAC) distributions cannot distribute binaries legally in some countries.
E.g. for Fedora, it is in rpmfusion repository and not in the distribution proper. Other distributions have similar arrangement for license-os-ok-but-patents-are-problem situation. These servers are outside US (or other countries that recognize software patents), and for the US users, the issue of obtaining the license is up to them.
The situation is so bad, that Fedora stopped shipping support for hardware acceleration of patented codecs (i.e. not complete codecs, but support to use the implementation in hardware, for example in your GPU), because they could be sued for contributory infringement.
Also note, that binaries for VLC or ffmpeg for Windows or Mac are similarly distributed from non-US servers, so basically the same situation as rpmfusion.
Note how AAC has effectively the same efficiency as Opus? HE-AAC and HE-AACv2 are notably better in comparison, and are usable at lower bitrates than AAC.
In cases where "the opposite is very well established", they're talking about AAC-LC. Citations that show otherwise are welcome! In any case, HE-AAC's universality is really beaten only by MP3 (which it trounces).
Opus is pretty widely supported (still some holes though, for example very few Adobe products support Opus), and it can sound better than MP3 at less than half the bitrate.
> The trouble is that images are a large fraction of the page size even for well optimised websites. Using smaller images via changing format often makes a significant difference in the page load time.
Pepperidge farm remembers Google heavily pushing for WebP for this reason. Look how much smaller it is! and Lighthouse giving demerits for every non-WebP image. Which, of course, is totally true, WebP images were almost always quite a bit smaller. They all looked like shit, too, of course.
> JPEG and PNG are good enough. Encoders and decoders are heavily optimized and omnipresent. No patent issues (either patent-free or expired).
I think you're getting downvotes for the sentiment, but I'm inclined to agree with both this comment, as well as the one you made a bit further in the thread.
> MP3 is also good enough. Modern compressors are on par with Opus, AAC or whatever is in fashion now. The format is sane, patents have expired, it run everywhere. It's not worth the trouble.
Something like Ogg/Vorbis would also be under my consideration for audio files, but I find these established and somewhat out of date technologies to be surprisingly stable and reliable. The same way how for many folks out there the H.264 video codec will also suffice for almost any use case (sadly Theora never got big, but there's VP8/VP9 too). The same way how something like ZIP or 7z will be enough for most archiving needs.
I think there is no harm in using these options and not sweating about having to serve every asset in whichever of the modern formats the user's browser in question might support (nor should you chase perfect quality), as well as be sure to actually convert the assets into a multitude of formats and include them in the page, as long as 90% of your hosting costs aren't caused by this very choice. Who cares if a few pixels are a bit off because you saved your JPG with a quality setting of 75 or 80? As long as your application/site does what it should, that's hardly something that most people care about, or will even notice.
> We should focus on other more pressing issues. This one is basically solved.
However, eventually something new is going to come out AND get stable enough to have the same widespread adoption as the current stable and dependable options. Whether that happens in a year, 5 years or a decade, only time will show (let's say, 95% of devices supporting AV1 and Opus). I think that then will be the right time to switch to these new technologies for most people, as opposed to being early adopters.
Yes, I’m in a bad mood and it probably came across a bit fatalistic.
> However, eventually something new is going to come out…
The thing is, human eyes and ears aren’t getting any better and computing performance seems to be plateauing. My guess is we have already reached the “good enough” point and I doubt we’ll get formats 3-4X as efficient in the future.
> The thing is, human eyes and ears aren’t getting any better and computing performance seems to be plateauing.
My guess is that we'll ideally end up with something that isn't hard to use copyright/patent wise and offers "optimal" quality but at smaller file sizes when compared with the prior technologies, which would be a good thing for both storing your own video files, as well as when handling them en masse in most businesses. After all, if someone could reduce the bandwidth usage of their video hosting or streaming site by 10%, they'd be saving millions in hardware/networking expenses.
Though I don't think that we're at that point yet, many of these more recent technologies are not widely supported, or just not good enough.
For example, I rendered a ~6 minute 1080p video with Kdenlive with multiple codecs, to compare encode performance and resulting filesizes with the default settings:
I probably should have figured out a way to get similar file sizes, but either way you can see that VP8 and VP9 take way longer to process when compared to the version of H264 that's available to me. So the old H264 is indeed a good enough choice for me, at least for now.
I have a strong suspicion you are using hardware acceleration for H264, while the VPx codecs are all being encoded in software.
IMHO the most pressing issue I have with h264 is that it creates files that are almost always larger than HEVC (or AV1), and everything I own handles H265 without any issues (I think it's also mandated by some DVB standard).
Just as ZIP is a mediocre compression algorithm that is supported basically everywhere.
Today we have larger storage and faster bandwidth than ever before so it’s easy to trade off compression for ease.
Same goes with mp3. Back in the day, Microsoft was touting that WMA could replicate the quality of MP3 128kbps at just 64kbps. I bought in and mass converted my entire library because of the high space savings. Now… I have a library of music that has so many artifacts in it, I can listen to it. At the time, the space saving may have been worth it but I now regret it.
Saying that DEFLATE is mediocre is absurd. There's a reason it's still in use after 30 years (and it's not merely inertia). Before zstandard arrived to the scene there was literally nothing able to achieve the same ratio as deflate without taking 10x longer.
This is unlike WMA/WMV which, I agree, were pretty much already obsolete by the time they came out.
You are overseeing how demand is still increasing I believe. Faster bandwidth than ever before is met by higher demand than ever before and it will continue like this for a while.
60fps HDR 4k Content will become normal, eating away at bandwidth availability, meaning that any possible savings will still be relevant, for example in still images.
For a start it's 2023 and all browsers do support lossless webp [1].
Lossless webp (webp can be both lossy or lossless) is not only typically smaller than most optimized PNG files but also compresses faster than the optimized PNG encoders.
So, basically, the main reason to use PNG in 2023 for anything Web related is if you like wasting bandwith. For 99.95% of the PNG files (something huge like that) are smaller when encoded with lossless webp.
Heck, even my Emacs supports webp images.
Then other formats are already knocking on the door and promise even better compression than webp.
Unless you have a large image in which case you can’t even store it in webp because of its abysmal max resolution. PNG continues to be the format of record for lossless image storage. Hopefully JPEG XL can replace it.
I know about adding noise, but to call that a solution is silly. You’re purposely degrading the image to get around the technical limitations of a 30 year old format.
I disagree about the noise reduction part too. Modern (larger, not sure about cell phone) sensors basically have basically no noise at ISO 64-100 or so. It’s just an inherent limitation of only having 255 steps of brightness.
Dithering is a feature. It’s used all the time in audio and image. And a little bit of noise is a good thing. Two of my favorite sources on the subject:
PNG only produces reasonable image sizes for a very restricted category of image. And while JPEG is decent for photos, it does a pretty bad job compressing images which contain sharp edges or text.
So neither is a good choice, unless you know a priori that all your images will fall into one of the categories where these formats do a decent job.
https://dennisforbes.ca/articles/jpegxl_just_won_the_image_w...
So if a site decided to support JPEG XL (including recompressing their existing JPEG library, going forward additionally high-quality encoding to better codecs for new images) they can easily support it in a way that lets the browser choose. No weird reverse proxies or machinations or polyfills to support it.
Another great aspect of JPEG XL is that it is a short circuit answer to almost any "what format should you store an image in?" question. Instead of the classic "is it comic book style? GIF or PNG. Photorealistic? JPEG. Need transparency....oof, is it photorealistic, because then things get ugly....", with JPEG XL it truly becomes the universal answer.
At least outside of absurdly high compression rates on photorealistic content where it does suffer.