Hacker News new | past | comments | ask | show | jobs | submit login
AVIF – AV1 based image format (aomediacodec.github.io)
176 points by lcnmrn on Apr 10, 2018 | hide | past | favorite | 41 comments

This is a format combining HEIF as the container with AV1 I-frames as the payload.

HEIF is the Nokia-developed, MPEG and ISO-blessed container for image sequences and transforms, which is published as MPEG-H Part 12, and is based on the good-old ISOBMFF, which you typically recognize as the ".MP4 container". HEIF's most wide deployment right now is on Apple iPhones, where it's the new default capture format for still images and 'live' images (effectively, a short video on either side of a high-res still).

AV1 is the AOMedia-developed, Google/ex-On2/Xiph.Org/Cisco consensus video format that's the evolution of the VP8/VP9 line and Daala.

This format takes the nice royalty-free state-of-the-art MPEG container and puts a royalty-free payload in it, making it a useful best-of-both-worlds.

Comparing HEVC-based and AV1-based: there's a tech demo (that I've posted here...a lot now heh) at https://people.xiph.org/~tdaede/av1stilldemo/. Both AV1 and x265 intra look great. If the artifacts when you compare to originals bug you, remember this is just a test at a level where JPEG is painfully blocky, and you can go higher!

Visually, I think AV1 adds less DCT-ish noise, but tends to blur small, low-contrast details a little more than x265. (Compare the ends of tree branches on the default image, Crepuscular Rays, or faraway tiles and details in Mercado dos Lavradores. Haven't done objective measurements though--you may disagree.) I, personally, like AV1's tradeoffs better, at least from what I see there. Patches of ringing draw attention, whereas slight blurring is something we're used to ignoring since it comes from lots of things besides codecs; natural vision and camera images aren't completely sharp either. (Dunno why AV1's that way, but might be that it does more aggressive deringing with that filter Monty's posting about next.)

The photo caption on the comparison in this story suggested the same conclusion re the detail vs. noise tradeoff, for what it's worth: https://www.cnet.com/news/google-mozilla-av1-photo-format-co...

The paper on the deringing filter included subjective tests, and it suggests that test subjects liked its results more consistently than you might guess from the changes in objective metrics--small sample size though: https://arxiv.org/pdf/1602.05975.pdf

Sometimes you may want x265's bias towards more detail; in the Tiger image, AV1 "cuts" the whiskers near the top of the image shorter than x265 does. Tricky to 'teach' encoders that some detail is more important than others despite both being equally low size or contrast.

Given they're both a ton better than JPEG, the big advantage to AV1 for a random developer is the openness; we can use it w/out needing to either take a legal risk or lean on the implementation of some company that's dealt with the HEVC patent mess for us.

To my eyes and preference, both x265 and AV1's stills are far less blocky and far too smoothed than I'd like, but then again I've been an unabashed proponent of less smoothing [1], having lived through early H.264 encoders' blurry messes.

Once I get past that, some of the test images look better on AV1, and some look better on x265. Some of them kill detail here, some of them kill detail there. I think this is a victory -- both of them perform really well, considering, and allocating a bit more bits to get it looking nice is always an option.

It's good to have a royalty-free codec that's actually good, and with much more profile and fanfare than just a weird garage-band secret project like VP9 was. And with widespread hardware support, it may actually make a dent in JPEG's marketshare.

[1] https://hn.algolia.com/?query=niftich%20fake%20sharpness&sor...

I'm with you - I find both AV1 and x265 to be over-smoothed. On the other hand, at these compression ratios, you don't get something for nothing, and I certainly appreciate the lack of block artifacts.

Uppermost tree branches almost disappear with both AV1 and X265 when background is the cloud. Jpeg manages to display them.

Jpeg still trims the trees, just not as much.

In your demo, how are the images chosen for each image and algorithm combination? I mean how do you choose the compression level for each, to keep the comparison fair? I would hope you have a budget in KB and go as high quality as you can without going over that KB limit.

Not my demo, it's from someone at Mozilla, but, yes, the usual practice for comparisons is to match the size of the compressed content. I don't have context for that page, so I can't actually cite anything here.

Isn't HEIF the payload and HEIC the container? (also C seems to stand for container in HEIC...).

It's confusing how people seem to use HEIF to mean container. And if HEIF is actually the container, then what is the payload in Apple's case, and what exactly is HEIC?

HEIF is the name of the container [1][2].

HEIC is an ISOBMFF 'brand' to be used inside the container to indicate that the payload's codec is HEVC (in 'Main' profile or 'Main Still Picture' profile), and that it's one still-image and not a sequence [1].

In most of the cases where the payload is the HEVC codec, the corresponding file extension is encouraged to be .heic and the MIME-type follows as well. Apple has a note about this in their transcript at WWDC 2017 [3].

[1] https://nokiatech.github.io/heif/technical.html [2] https://mpeg.chiariglione.org/standards/mpeg-h/image-file-fo.... [3] https://developer.apple.com/videos/play/wwdc2017/513

So it's basically what webp is to webm? Also how does it compare to my favorite image format, FLIF? http://flif.info/

It's more like a PowerPoint document with any number of WebM embeds and other files.

HEIF/AVIF is a complex kitchen sink of all the features that a mobile camera app may dream of, so it supports arbitrary collections of images and videos, e.g. "live" photos, tiles, bursts, subtitles, multiple layers, multiple versions.

So it's basically the new MNG?

FLIF has much better lossless compression than AV1 but much worse lossy compression.

About 95% smaller than FLIF I would think.

How is it 95% smaller? FLIF is a lossless compression, and AV1 in lossless mode can't do magic. It can't do 95% better.


FLIF is reported at 59.57% avg space savings

AV1 (2018-02-22) is reported at 66.02% avg space savings

presumably it's 95% smaller because it's not limited to lossless mode?

FLIF is also not limited to lossless mode, http://flif.info/lossy.html

And FLIF's lossy mode is as good as AV1's?

That part I don't know. I haven't seen a comparison between them, but it's compared well at least against BPG, or whatever the h264 based format was. I don't think i've seen a comparison against h265, let along AV1. It'd be interesting to see. Even if they're about as good, I like FLIF's setup better since it enables progressive decoding which could be a big step up for slower connections.

I found no mention in the document of CMYK color space or mandatory dpi attributes, both of which are severe pain points for PNG images.

Does anyone know about this?

If you’re working with CMYK images, use standard TIFF. Does everything anyone could require for distribution, other than bleeding edge compression options.

The print production world doesn’t need war file formats. It rarely operates in a bandwidth restricted environment. And any new image format would take years to be sufficiently supported by all relevant software packages.

For most images most the time, conversion to CMYK should be a last minute operation that takes into account source and printer profiles. The only immediately obvious exception I can think of is graphic artwork which should be stored as vectors anyway (with embedded pixel graphics where necessary).

So my DPI use case was actually for saving screenshots but the CMYK usecase was for taking RGB or RAW photos and converting them to CMYK color space for future use in CMYK documents, having touched up the colors to match the constraints of the color space.

Presumably the AVIF format supports extensive metadata. Supporting DPI merely requires a single numeric piece of metadata to be stored. The only part which may not exist is a defined standard name or location for this, and read/write support in the major image and publishing apps.

This is all entirely in the hands of Adobe.

My point still holds for CMYK though — bandwidth optimisation is rarely a concern for intermediary files. In fact, it's common to see compressed files rejected for being "low quality" based on file size and with no regard for the actual image fidelity. In this industry, formats like TIFF are sufficient.

If one was to propose a replacement for TIFF, it would need to support a lot more than just CMYK. Off the top of my head, it would need full alpha, embeddable profiles, 16 bit support, arbitrary channels, spot colours and vector clipping paths.

(As for CMYK conversion: even if you are touching up photographic colours for CMYK, this should be performed within the RGB space and soft proofing. The final conversion to CMYK should occur at print-time unless there's an excellent reason otherwise, e.g. if you are applying CMYK-inspired graphic art effects.)

Representing DPI in the real world actually requires at least two if not more floating point numbers.

The print/prepress industry has a long history of devices with asymmetric resolution, or higher resolution in monochrome versus color (similar to 4:2:0 pixel format used in digital video).

Considering professional print/prepress machines last 20 or more years in-service you have to handle just about everything.

All true, but irrelevant for images. All the number does is indicate the intended physical dimensions of the image.

For example, an image with pixel dimensions of 1800 by 1200 and a “DPI” metadata value of 300 will print with a physical dimension of 6.00 by 4.00 inches.

You misunderstand: the file format itself needs to have at least horizontal and vertical DPI values to be generally useful.

Many film scanners for instance are say 2400 dpi vertically and then some other DPI value horizontal depending on the film or picture format. Anamorphic lenses are used to scale a wide-aspect image onto a 4:3 sensor.

Some scanners and printers even have different DPI for different color components.

But it’s a big deal to have that field be named. Like you say, literally anything can be put in the unnamed metadata fields, giving them no weight. PNG even has arbitrary metadata fields and that hasn’t translated to anything remotely resembling proper dpi support there.

Correct, which is why I said realistically this is all in the hands of Adobe. They could support DPI on PNG tomorrow if they wanted to, and their metadata label would become a de-facto standard overnight.

AV1 (and thus AVIF) appears to be designed for screens (works in YUV color space).

If you want to print, PDF is probably still your best bet.

AV1 can also encode RGB color space. Such usage would probably be more prevalent in the still image use case.

Actually I munged two issues. CMYK is for print but DPI is for retina images.

They have a metadata block type defined and you can shove whatever metadata format you want into it (EXIF, etc), and it has "image properties" for the necessary bits (presumably DPI would land itself as an image property, but I can see leaving colorspace out by assuming a "sane" default of something like sRGB).

Since there are plenty of reasons you don't care about this metadata (e.g. to reduce the transfer size on images that are requested millions or billions of times by a web browser), as much as possible it should be left optional.

But all of this seems to be pretty much draftware at the moment, so we'll see how things shake out.

You are correct: the specification is a work in progress which we plan to wrap of quickly. The approach is not to over-specify the format with features/options initially, but begin with a functional sub-set of HIEF and MIAF. The target is to provide an HDR/WCG successor to JPEG. This work is being done by a working group in the Alliance for Open Media.

The name will just lead to confusion. AVI is already a well-known and much-used video (container) format, with no relation to AVIF or AV1. As discussed in AV1 discussion yesterday, AV1 will already cause confusion, and possibly coding and security errors. This name will lead to more of the same. Regular expressions, in particular.

Facebook also posted some performance tests comparing AV1 to x264 and libvpx-vp9: https://code.facebook.com/posts/253852078523394/av1-beats-x2...

Does any of this translate to still images? How much better would it be compared to, say, a JPG? I assume somewhat better but it would have to be magnitudes to take over.

I was wanting this just a few week ago: https://news.ycombinator.com/item?id=16709626 I didn't realize it already existed. Amazing.

It's absolutely irrelevant what container it uses (HEIF/ios/mp4), more over from my experience it was pure pain in the ass to deal with all these boxes and different ways different products produce them. I would never use it for an image format. Last time I checked it seemed like Nokia was a sinking ship, why would anyone board it?..

Keep it separate: make something that encodes image data separately and then use some container that to wrap it nicely. Same shit that works for jpeg should be reused IMO, it just works (all that metadata), simply replace coded image data with a better encoder (AV1)

when I see that nokiatech "tech" where some dudes make these nice documents on how to pack images in mp4, all reaction I get: WTF?! don't see no tech in there. All that shit exists for videos, treat image like a video file that has a single first i-frame and nothing else, that's what AV1 in mp4 is all about. Is their tech about limiting possibilities with mp4? I didn't ever read their tech and wouldn't bother spending even a minute. Like a year ago I started reading it and was WTF is this bullshit, a doc on image container format!??.. well, still all different devices will produce shit load of random mp4 structures, what's the point, you'll always have to deal with that pain in the ass that comes with all these possibilities with mp4 and you basically need a full blown lib to deal with it you pretty much cannot code it manually to parse mp4 to get coded image and show on the screen.

Oh, look, another image format without support for stereo, cube layout, or MIP maps!

I mean, it's not as if media is going 3D and VR right this moment (or even in the last few years) or anything...

Applications are open for YC Winter 2021

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact