Hacker News new | past | comments | ask | show | jobs | submit login
AV1 beats x264 and libvpx-vp9 in practical use case (facebook.com)
296 points by runesoerensen on April 11, 2018 | hide | past | favorite | 125 comments

Meanwhile, there are 2.8 MB of images in the article with the header image alone being a 1.6 MB JPEG (!). They can be reduced by at least 50%, given that the deringing filter works fine on contrast text/charts. Even more, the optimal format for this type of images is PNG not JPEG. After the conversion and optimization, they take 889 KB [1], >3 times smaller, and doing the same on properly resized originals without JPEG artifacts can bring them down to 100-200 KB.

[1] https://github.com/vmdanilov/optimage/files/1898955/images.z...

Totally appropriate self promotion, I approve.

If that graph is a jpeg it kind of disqualifies the author in any discussion on compression.

> Our tests were conducted primarily with Standard Definition (SD) and High Definition (HD) video files, because those are currently the most popular video formats on Facebook. But because AV1's performance increased as video resolution increased, we conclude the new compression codec will likely deliver even higher efficiency gains with UHD/4K and 8K content.

This remark about higher resolution reminded me of something I've been wondering about frame rates, which are essentially also a higher resolution, except across the dimension of time.

Assuming low to negligible noise from the camera (or a CGI, which avoids the problem altogether, shouldn't doubling from 30FPS to 60FPS have less than double the size increase? The reasoning being that changes per frame are lower, leading to better predictions.

Is there any rule-of-thumb formula for expected size increase as frame rate increases?

The delta encoding will be more efficient with higher frame rate, but also x264 has a tweak which decreases quality factor as frame rate increases. Why? The reasoning was that, since the frame is displayed for less time, you don't have as much time to appreciate the details and hence fewer bits can be used, while still maintaining the same perceptual quality.

I found this out the hard way when encoding time lapses, even if the delta between frames is identical, the final choice of frame rate greatly affects output file size.

If all frames have lower quality wouldn't it still be perceived regardless of frame rate?

I'm no expert but I would say no. Each frame delta might have less/same information, but since you're applying more deltas/second, it gets better again.

Wouldn't this lead to bigger error accumulation? At least, my gut feeling says "less information per delta = larger error accumulation", and on top of that we have more deltas/second

Oooooof, now that is a good thing to know.

Speaking from experience with recording 120fps, which inevitably stutters on playback unless shown at half speed, I suppose also that it is a matter of "if the CPU can't keep up decoding, there is no point"

Increasing the frame-rate increases the bitrate by less than 100%, because you just double the amount of inter frames, and not the amount of keyframes, which weight way more (between one and 2 orders of magnitude generally speaking).

If you look at this chart[1] (that I stole on this[2] article), you can see that the first frame (a keyframe) is way bigger than the others.

[1]: https://people.xiph.org/~xiphmont/demo/av1/framebits.png

[2]: https://people.xiph.org/~xiphmont/demo/av1/demo1.shtml

YouTube recommends a 50% higher bitrate for 50/60FPS compared to 25/30FPS.


Note that these are upload settings. YouTube will re-encode all these videos, so they give a little bit higher bitrates than what you'd typically end up with after they've re-encoded the sources.

I don't know any math, but speaking from experience streaming it's around ~30% more.

Thanks for the datapoint, even if it is anecdotal!

As colde points out in the other reply, the type of content matters so: what kind of content do you stream? Games? If so, which genre(s) (as in: a 2D platformer might compress much better, given the simpler motion, not to mention minimalistic graphics that are more common there)

Yes, generally it should less than double the size. How much less depends a lot on your content. If its a talking head, it should be very efficient, if its action movie content, maybe less efficient. I haven't found a good rule of thumb to apply generally though.

"Our testing shows AV1 surpasses its stated goal of 30% better compression than VP9, and achieves gains of 50.3%, 46.2% and 34.0%, compared to x264 main profile, x264 high profile and libvpx-vp9, respectively.

This sounds like AV1 is actually worse than H.265, at least at low-ish resolutions. Am I misreading these results?

For context, I've moved to using H.265 (via the x265 encoder) for archiving 720p and 1080p video. At least for low bitrates (in the 500-1500 kbit/s range) it gives me over 50% smaller filesizes compared to x264 high profile at roughly the same quality. The encoding speed is much slower, of course, but not by more than 10x on modern CPUs, typically less. I had hoped to move over to some AV1 implementation in a few years, but these results make that look doubtful. Are the supposed advantages of AV1 over H.265 confined to ultra high resolutions and/or high bitrates?

Please note that this is a purely technical question. I don't care about the patent issues and for those that do, they've been covered at length in other comments already.

> This sounds like AV1 is actually worse than H.265, at least at low-ish resolutions. Am I misreading these results?

Yes, VP9 is basically on par with H.265 at standard resolutions, so if AV1 is 34% better, then it is a lot better than H.265. H.265 also doesn't tend to scale as well down to smaller resolutions, a lot of AV1's benefits scale well across the resolution spectrum (it seems).

I'm sure a lot of the confusion about this comparison comes from the fact that a) we're not talking about UHD content, which is where H.265 was intended to shine, and b) Facebook probably has never had, and will likely never have, any interest in H.265 because it is simply too legally and practically cumbersome.

> Yes, VP9 is basically on par with H.265 at standard resolutions, so if AV1 is 34% better, then it is a lot better than H.265.

Is that really true, though? The most comprehensive test I've seen [1] led Netflix to conclude that H.265 was better than VP9 by about 20%, including resolutions all the way down to 360p. In my own experience and other tests I've seen the gap seems even wider. There's a few usecases where VP9 is superior (most notably screencasts), but generally it seems to lag behind H.265 while also being much slower to encode.

[1] http://www.streamingmedia.com/Articles/Editorial/Featured-Ar...

Sigh. Just reading some of these comments here, and even some in Doom9, turns out we haven't learn much over the past nearly 20 years?

Doom9 was started 18 years ago, and many of these has since been discussed many many times.

And yet people still look at PSNR and SSIM as gold standard. And many here start to present them as "FACTS", even when they have been told not to repeatedly. Even when we have present them with all the information of why. And this is even happening on HN. May be we dont have a problem of fake news at all, we have a problem with people not willing to learn and understand, and decide to believe what ever their narrative best. This isn't just royalty free ( Not patents Free ) AV1 codec, but also programming languages, etc Sorry I am going a little off topic.

I really like you stating patents not an issues, so we throw away one question that does not have a right or wrong answer.

To answer your question, if your encoded copy is only viewed by you, then yes you will be the judge whether it is of similar quality, not PSNR or SSIM or even VMAF. We haven't have enough time to fully test out and evolve the VMAF metrics, but so far it is at least a much better metrics then PSNR and SSIM that is being used in EVERY AV1 test that begs about it being superior.

Depending on who you ask, but anyone who hate patents or on the AV1 camp will tell you their VP9 is already better then x265, as a matter of facts they have been stating this since day one before VP9 were any good. But as you pointed out using VMAF metrics x265 is better then VP9, and VP9 has been tuned for PSNR, and SSIM values, which is part of the VMAF calculation. This gives a slight disadvantage to x264 or x265, both dont give a damn about PSNR since its inception.

I do admit AV1 will likely be better then x265 in encode, given the immense complexity involved. But so far i think it is marginal to slightly better, the 30% better doesn't mean you can get a 1.4Mbps AV1 Encode to look similar to 2Mbps x265, at least in my view. You can test this yourself, but that is assuming you have enough time and spare resources to play around with the AV1 insanely slow encoder.

I guess it depends on content and your infrastructure. Whenever I put anywhere near similar encoding time and bitrate into a VP9 stream and an H.265 stream, I tend to get very similar results with both.

Yes, but VP9 is free.

"compared to x264 high profile at roughly the same quality"

Compared how? With what kind of source videos?

Unless you are matching SSIM on similar content I don't think you can meaningfully compare your numbers to the ones in the article.

Me, I just take x264 video I downloaded off the back of a truck and re-encode it to x265 with default settings on FFmpeg. It looks the same visually to me on my TV but the file size (including audio) is typically 25%. I realize this is the opposite of a carefully controlled codec experiment. Just saying in practice it works pretty well.

If you're curious about x265 in practice, the pirate scene group PSA-Rips is doing a good job recoding stuff to x265. http://psarips.com/

To be a little fairer you should transcode it to both x264 and x265 so that the outputs are can be compared to each other (rather than comparing the transcoded x265 to the source x264 as you did, which isn't quite the same thing).

I did that experiment with a file I found online. The source x264 video stream was 1600 MB. My x264 recoding was 290 MB, and the x265 was 80 MB. (There's also a significant savings in ffmpeg recoding the audio, which accounts for the file sizes I reported). In all cases I'm sure my result is lower quality, but still more than acceptable for me watching TV with low expectations.

It's certainly not a precise comparison, you're right.

Nevertheless, compare for instance x264 and x265. Exactly how large the difference between the two is depends on the type of content, bitrate, resolution and other factors, but it's hard to find any case (encoding speeds aside) where x265 wouldn't be substantially better, to the point where we generally call H.265 a better codec.

I was expecting a similar kind of leap with AV1. This comparison may not be conclusive, but it looks like that's probably not going to happen.

AV1's competition is x265, not x264.

AV1 is a codec, not an implementation. Its competition is the upcoming H.266. VP9's competition was H.265, and libvpx-vp9's competition was x265.

The exact correspondence between the VPx-es and the MPEG codecs is not that relevant, because old codecs don't go away when new ones come out. In fact, hardware decoders are irrevocably baked into devices where consumption happens, so anything H.264 or newer becomes difficult to displace -- like Google's ham-fisted push [1] for VP8 and VP9 on YouTube for their own benefit, burning through people's batteries in software instead of serving H.264.

But I'd argue that the MPEG codecs roughly slot in between each tier of VP8, VP9, and AV1, because the MPEG codecs are tremendously flexible (as can be seen in Netflix's tests [2]), have dozens of competing encoders of different qualities, and dozens of profiles that constrain or unlock advanced features. So in a way, it can be simultaneously true that AV1's competition is both H.265 and whatever comes after H.265.

[1] https://news.ycombinator.com/item?id=13230676 [2] https://medium.com/netflix-techblog/more-efficient-mobile-en...

VP9 had more browser support than H.265, the codec it competed against. It was Apple's hamfisted attempt to hinder royalty free codecs for the web (among other web standards) that gave some users a bad battery experience. https://ngcodec.com/news/2017/10/21/why-we-are-supporting-vp...

That Google didn't force VP8 over H.264 is what led to the unfortunate situation with Firefox needing to ship a binary blob. I'm happy that Google stuck to its guns in the next round.

How is serving better quality with less bandwidth "ham fisted"? Both Netflix and Google says the codec is better. If anyone was ham fisted it was Apple who fought to stop free codecs'.

He explained that in the same sentence. An imperceptible quality improvement is not worth an order of magnitude increase in CPU usage, and the network savings aren’t enough to make up for dropped frames on hardware which is more than a year old.

Netflix says VP9 is 20% worse than 265... so what are you referring to?

Care to spell this out in more detail for those of us who don't know the relation between x265/h265/codec vs implementation/etc.?

You use x26* to encode h26*.

The correction is correct, albeit a bit pedantic, given the point of GPs comment.

It's the difference between a file format and a library used to write the format.

(but that explanation also glosses over the fact that containers are something else entirely... "data format" may be better than "file format" here)

Codecs are almost like programming languages. They are highly modular and flexible.

Implementations and encoders actually use algorithms to turn the media into something that is compatible with a codec.

That's interesting.

But how can an implementation be compatible with a codec, if a codec is not set in the stone -- i.e. if there are different versions of a codec?

It may help to think about video codecs as standards on how to decode images from a series of bits, not on how to encode them. Decoding is deterministic, encoding is not. You have to consider that these encoders are lossy, i.e. while encoding they modify the information so that it is more easily compressible. Different encoders will do different choices and produce different result while using the same bitrate.

Obviously encoders are constrained by the fact that they must produce a series of bits that can be decoded according to a certain standard, so a new video format can enable encoders to compress information more efficiently (i.e. less perceptual loss of quality at the same bitrate).

That makes it clear, thanks!

Codec refers to the implementation.

Thanks. You are correct. I should have said video coding format.

Very pedantic. The competition is between real encoders not potential encoders.

Actually AV1, H.265, and VP9 are coding formats, while AOM AV1 reference software, x265, and libvpx-vp9 are codecs... while were being pedantic.


That's also unlikely to be true. We saw both VP9 and AV1 come out within h.265's lifecycle (with at least 2 more years to go until the next spec is ready). Chances are there will be an AV2 coming out around the same time of h.266 release.

But either way, the whole industry has already backed an open and royalty-free codec. The release of h.266 (if that's what they're calling it) will be pointless because it's highly unlikely anyone will adopt it. They may as well release it as open source if they did the work anyway.

No, VP9's competition is HEVC/H.265, VP9 ties for efficiency and wins on practical deployment on open platforms. AV1 is competing with all of these codecs, at every resolution, and at every complexity point, but especially with HEVC's successor planned for 2021.

This. Even if it takes another year to optimise its encoder and develop hardware acceleration, AV1 will be in the market for two years before MPEG's 2021 forecast.

HEVC's successor will need to be materially superior AND have compelling and final patent licensing terms in order to be remotely viable.

>materially superior

To AV2, that is. I don't expect this to stop at AV1.

As compressing more gets harder and harder, a significant enough improvement to tempt people into using a royalty-encumbered codec is unlikely to happen ever again.

Nobody's competition is x265 (HEVC) until its patent licensing situation is clarified.

Of the streaming services that currently do 4k or HDR, only YouTube isn't using HEVC.

Actually thinking about it again, I guess most 4k porn probably isn't HEVC, though for other reasons I suspect. But Netflix, Amazon, etc. all kept distributing HEVC when the licensing situation was even worse than it is now.

Not in terms of price. HEVC can cost more than $1 per unit. To high for things like rpi.

The Raspberry Pi Foundation has previously supported MPEG-2 and VC-1 hardware-based decoding by selling additional licenses in their online store [1][2]. By making the licenses optional they didn't need to increase the cost of all Raspberry Pi devices unnecessarily.

There isn't any reason they couldn't do the same for HEVC or H.266 as long as the hardware support was there.

[1] http://www.raspberrypi.com/mpeg-2-license-key/

[2] http://www.raspberrypi.com/vc-1-license-key/

> There isn't any reason they couldn't do the same for HEVC

The practical problem with HEVC licensing is that it's complex and uncertain. There are three separate patent pools (MPEG LA, HEVC Advance, and Velos Media) and there are other licensors who aren't in any patent pool. The licensing situation is so bad that Leonardo Chiariglione, founder and chairman of MPEG, calls HEVC an "unusable modern standard":


A while back the x265 team pleaded with the industry to sort out the HEVC licensing problems, but that never happened:


HEVC licensing may be too risky for the Raspberry Pi Foundation. It's simpler to use H.264 and VP9 today and to look to AV1 for the future.

> There are three separate patent pools (MPEG LA, HEVC Advance, and Velos Media)

Four pools — also Technicolor.

Thee gpu/video block hasn't been updated since RPi 1, I wonder what their plan is.

To the extent that is true it seems like x265 already lost the fight for Facebook's usecase of web video before they even ran the software.

I wonder about CPU decoding. I never cared about codecs, because every movie was played at proper FPS on my 2013 MacBook Air. But recently I downloaded 4k H.265-encoded movie and it turns out that MacBook just wasn't able to decode it at proper speed. I hope that AV1 will be more efficient.

The determining factor is hardware decode support, not efficiency. H.264 is worse than H.265, but if that movie was H.264, it would have played back flawlessly due to the Intel hardware decoder.

It would also play back flawlessly even without the hardware decoder, because H.264 is so fast to decode nowadays. If AV1 is fast enough it can allow people to use AV1 without buying new hardware.

The software decoder is not considerably slower than the VP9 decoder, which is pretty fast. If you want to try it out, Firefox Nightly ships a pre-release AV1 decoder behind a runtime preference, so you can try it out. There's still a chance that it can't keep up, of course.

Also, it's likely that you've been using the hardware decoder on your machine for H.264, which would further take a load off. H.265 decoders also vary considerably in terms of speed, I've noticed recently that ffmpeg decodes H.265 faster than it used to.


Encoding time is 667x compared to VP9. How can it be used in production ? 667 TIMES LONGER

This is just the reference implementation. Its purpose is as a check for production versions to verify they produce the right bits, and to make clear what all the protocol options are. Production performance will be comparable to software implementations of the other protocols; and then hardware will be real-time, soon enough, and there will be expensive way-faster-than hardware for transcoding services. If you have much transcoding to do, you farm it to them.

The performance of the first x264 software implementations was no better, although you never saw them.

> The performance of the first x264 software implementations was no better, although you never saw them.

Yeah, the JM was ridiculously slow. Didn't stop people from making fast H.264 encoders.

Netflix (who are part of the group developing this) have stated they'll roll it out in production once it's less than 5x slower than VP9 so we can assume they are planning on it getting at least 134x faster than it currently is.

Shouldn't they be concerned about decoding speed more than encoding.

I don't think anyone is worried about the decoding requirements, which will be higher than the previous generation but only a factor of 2 or so.

Obviously, they won't be able to roll out high resolution content to phones, TVs or set top boxes until devices support it with hardware, which will be a couple of years from now.

But, there's plenty of people in the world with decent desktop power and terrible bandwidth who can benefit immediately from squeezing better video through their connection at the cost of more CPU at both ends.

Generally you won't hear those people commenting on forums like this, but they show up in companies like Netflix's business plans. YouTube has produced maps showing which countries had greater usage due to VP9.

I think them accepting a 5x encoding slow-down means they are mostly concerned about decoding speed.

Because it doesn't matter when you server the video 3 Billion times. Keep in mind that Gangnam Style views resulted in an int32 overflow.

For companies like Netflix, Facebook, Youtube and many more it just makes more sense to handle increased rendering time.

In addition to this, this is just a test implementation that's yet to be performance optimized. Also currently there's no hardware encoders, which will also give a serious boost to encoding performance.

Valid point.. But I was skeptical about it's adoption in the live streaming arena !!

Xilinx's statement [1] hints at FPGAs getting deployed in datacenters to accelerate AV1 encoding, which should tremendously improve performance (it seems more likely to me that the quoted 10x improvement is against a mature/optimized encoder, although I could be wrong).

Google has (will?) published a verilog implementation, so in theory, you could put that on a FPGA you have lying around.

[1] https://aomedia.org/the-alliance-for-open-media-kickstarts-v... (scroll down)

How about decoding performance, which is much more important.

Both are important, especially in the age of user-generated content and 4K camera in every new smartphone.

Most people spend lot more time decoding than encoding.

Yesterday's xiph.org thread had the same discussion

- for live-streaming, encoding is the bottleneck. If the encoder drops frames, all recipients drop frames.

- for other purposes, decoding is likely the "bottleneck", although even then: the slower and more CPU intensive an encoder is, the more times a video has to be downloaded and viewed to "break even" through the "savings" on the decoder side. So there still is a trade-off to be made with how slow an encoder is allowed to be.

Decoding is going to have be supported in hardware for any sort of mass adoption anyway.

Isn't same true for encoding.

I don't see hardware support hardware support coming in next 2-3 years. Then also everyone is not going to change their hardware immediately. Good Software performance is quite important.

The interesting bits for me:

"Our testing shows AV1 surpasses its stated goal of 30% better compression than VP9, and achieves gains of 50.3%, 46.2% and 34.0%, compared to x264 main profile, x264 high profile and libvpx-vp9, respectively.

"However, AV1 saw increases in encoding computational complexity compared with x264 main, x264 high and libvpx-vp9 for ABR mode. Encoding run time was 9226.4x, 8139.2x and 667.1x greater, respectively…"

Let that sink in:

'Encoding run time was 9226.4x, 8139.2x and 667.1x greater, respectively…'

The codec is good, but the encoder has a long way to go. This places it squarely in the realm of only worth it right now if you're serving the same video millions of times (someone graph this?).

It's a bit hyperbolic to claim that this is a 'practical use case', but maybe at Facebook/Netflix/YouTube's scale, it is. But it will be exciting to watch this space.

The same thing was true of H264 when it first came out. Reference encoders/decoders are always incredibly slow -- they're written to be correct, not fast.

> they're written to be correct, not fast.

AOM dev here. I'd love you to be right, but that's unfortunately not the case. They're written to improve VQ.

Regarding the "written to be correct" part of your sentence:

There's no emphasis on correctness. For monthes, aomenc has been accidentaly using different transforms sets for 8 bit and 10 bits. It also used to compute its RDO based on the legacy probability model (non-CDF) although the result was binarized using the new one (CDF)). Moreover, serious regressions (decoder can't decode the encoder's output, lossy lossless) are introduced almost every day (!), fixed the next day.

Now, regarding the "not fast" part of your sentence:

There's been a lot of work on SIMD (Current aomenc performance is measured with SIMD optimizations). Actually, the code features complex mechanisms (C header file generation from Perl scripts at configure time) to handle the dispatching between the optimizations. To make the matter worse, almost all the pixel prediction code is physically duplicated in two versions: the generic code that uses 16-bit pixels, and the 8-bit optimized code that uses 8-bit pixels. All of these are actually huge hindrances to performance-unrelated developments.

What most people don't realize is that diminishing returns would be inevitable even if there wasn't a theoretical limit to compression. While it's true that computers are getting more powerful all the time, we're also getting more bandwidth and storage space all the time too. Even if it were possible to achieve results on a visual par with x264 in half the encoding size, the computational complexity would be so overwhelmingly high that it's not hard to see why in most cases you'd want to stick with the faster codec simply because of encoding time and power use on consumer devices.

It really wouldn't be that surprising if we're still using video encoders with efficiencies somewhere in the h264-AOM range 1000 years from now. Cases where the tradeoffs make higher efficiency worthwhile are hard to come by.

Many continue to use this comparison.

H264 Reference Encoder wasn't even tuned for anything. Not even quality. Hence why we not only get much better speed in x264, but also higher quality.

And the Reference Encoder ( JM ) was only 250 times slower when it first came out. It truly was a pieces written to be correct rather then anything else. And there was some easy wins to get it 10x - 20x faster within a matter of weeks.

Now Av1 is nearly 10000x slower, that is if you manage to speed it up 100x it is still 100x slower. The codec right now was tuned for quality metrics. They were trying to hit PSNR, SSIM and even VRMF at 30% higher then VP9 at all cost. Which is not the same as JM.

I dont doubt they could speed it up. But by how much. Some even mention it is only usable by companies with Scale, well this is so far from truth. You dont gain much economy of scale once you past certain level of compute. What cost $1M dollar to encode at facebook now cost $1B or $10B as it stands. People seems to have a real lack of sense in scale of how much 1000x is. Or even 10K x. Your compute cost is going to be dictated by CPU manufacturer or silicon prices. And that 30% of bitrate reduction is not going to save you money from the insane increase of compute. Especially if you are Netflix with Appliance sitting at ISP's DC.

Not long ago I asked how are we going to get the encoding speed to some sane's time. Like 10x of x264 High Quality. Turns out it will very likely have to disable certain tools set now being used, which means a reduction of quality. But I believe it will still be better then x265 at the end. The tools available for AV1 is quite massive.

Right now we have xvc which is looking more like an H.266 encoder, and offer 30% improvement over AV1 if you want to believe PSNR and SSIM. But it is even slower then AV1.

It seems AV1 allows to rotate reference blocks in inter prediction. I wonder what an algorithm that would search for a prediction MV would look like (not even counting bidir).

Even without rotation, MV search can get sophisticated and not exactly intuitive in a typical encoder. And with HEVC having "merge" that win that much bits, it's not like there was so much bits left to earn with the predictor itself.

To get to “sane time”, you break up the video and have more than one encoder running at a time: http://cseweb.ucsd.edu/~gmporter/papers/nsdi17-excamera.pdf

That's only going to give you a 4x improvement on typical devices and is something HEVC can do too.

Why only 4x? Video can be subdivided into more pieces.

Because most devices are quad-core. After a certain point context-switches create a diminishing return.

Distribute across multiple machines.

and for consumers, as long as the decoder is fast and cheap...

I'd assume encoding was largely software-based using a POC encoder while x264/vp9 video encoders probably already take advantage of hardware acceleration shortcuts evolved over many years.

> the main focus of current AV1 development is on speed optimization to make it practical for use in production systems

Remember: first make it work, then make it right, then make it fast. Seems they're only starting the 3rd step now.

No, it wasn’t hardware against software.

But, and it’s hard to understate this, software like x264 has many man years of intricate and clever optimization work under its belt. Just look through some of the development history to see the kind of hoops jumped through and low level clock tick counting tweaks that have been made.

Also, I believe in general the expectations around AV1 have always been set realistically in the sense most are expecting that it will likely always require more compute to achieve a significant portion of any superior compression.

Important as those caveats may be, clearly there is a long way to go and you may still legitimately ask how do we know a 3 orders of magnitude deficit can be made practical?

In truth we don’t know with certainty where it will end up, that is true. However the bet was not made by guessing. There has been lots of investigation and analysis of what kind of implementations may be possible including design review with hardware engineers for that side of things.

So essentially it’s designed to outperform the current state of the art, and some high stakes educated projections have been made that it will become practical within a timeframe that still makes AV1 competitive enough to be a worthwhile value proposition.

libvpx, the VP9 encoder library used in this test, has no support for any hardware encoder blocks for VP9 [1], so it does everything in software. There are ways [2] to compile some support into ffmpeg-with-libvpx that makes it able to invoke the hardware encoder in newer Intel CPUs (Skylake or newer) [3][4] (using vp9_vaapi) but it's doubtful that this was used, since their command-line switches indicate nothing of the sort.

x264 is a software-only encoder that provides no hooks into hardware acceleration. Their ffmpeg command line indicates that no hardware acceleration hooks of ffmpeg were used.

[1] https://github.com/webmproject/libvpx/blob/master/CHANGELOG [2] https://gist.github.com/Brainiarc7/24de2edef08866c3040805048... [3] https://cgit.freedesktop.org/libva/commit/?id=fb57f5c15e72c3... [4] https://en.wikipedia.org/wiki/VP9#Hardware_implementations

but let's remember that x264 has had a lot of time to find drop optimizations in simd use and time for CPUs to start optimising for it as a use case.

x264 runs circles around all the hardware accelerated encoders, at least quality wise.

I haven't tried AV1 yet, but the only way I was able to encode with x265 in a reasonable amount of time was to shard the job over multiple machines and then join the file back together at the end.

x265 gets ~10fps on veryslow with 480p video on a recent workstation. Intel and AMD are pretty much tied there because Intel's per-core performance is significantly higher for x265, but AMD has more cores.

x264 is a software codec.

>The codec is good, but the encoder has a long way to go.

True, but it hasn't been optimized as of yet, I'm pretty sure there are a lot of low-hanging fruits once they start converting code hot paths into hand-optimized assembly.

That said, no matter how much optimization it gets, it will be slower than equally optimized HEVC, h264 given that AV1 is a more complex codec.

On the flip side, as the encoder gets better, the quality and bitrate will improve as well, not just the encoding time.

Netflix yes, others no, most videos aren't played often

Certainly some videos on youtube, and which ones should be pretty predictable.

It doesn't even need to be predictable, just reactive.

At least 2 new software encoders are coming by and they are way faster than this, while achieving the same (or better?) quality than libaom.

Do you have any more info on this?


Note that this is for the AV1 reference software encoder. It's not hardware accelerated, and probably built with limited regards to speed and e.g. vectorisation.

Much worse than I tough. With this complexity increase, HEVC is by far the best choice.

For now - and even HEVC encoders are still pretty new.

Is this going to be supported as a native platform on iOS and Android? Let's say I would like to start a new company that delivers video content to people. Could I use this today as the primary format for that? The performance numbers look great.

No, because there's no hardware decoder support for it yet in common chipsets. We're probably three years away from that, realistically.

From Doom9

Harmonic showed a comparison demo at NAB, pitting AV1, AVC, HEVC and JVET against each other. At the equivalent bitrate of 1.9 Mbps they got the following results:

Codec -- PSNR -- VMAF

AVC -- 32.7 -- 58 HEVC -- 36.7 -- 80 AV1 -- 37.2 -- 83 JVET -- 38.5 -- 88

(Several have asked, but I haven't seen a good response yet.) What are the computational requirements for AV1 decoding compared to x264, v9 etc? Is it in the same order, multiple orders worse? Or lower?

They are nearly identical. But your current CPU does not have a integrated hardware decoder for AV1, thus it might practically be slower for now.

I don't know the true answer to your question, but since encoding is expected to reach just 5X of VP9 by the end of this year, I think decoding is in the same order.

The plan, of course, is in the coming years there will be decoders baked into the silicon so the computational complexity is offloaded from the CPU and therefore becomes (in some ways) irrelevant.

I wonder if anyone has tried rendering directly to AV1 (or x264, or what-have-you), e.g. in a video game. There is all this hardware devoted to turning a few bytes of input into lots of pixels out. Maybe you could do less work producing just the few bytes, and let dedicated hardware generate the pixels. Of course you don't need for the CPU to generate the bytes; code running on the GPU would still do that, but with less work to do, you might get higher frame rates, or better detail at the same rate.

You heard it here first. (That is, unless it doesn't work.)

GPU pipelines deal with geometry and spatial vectors to compute final pixels, so arguing about video encoders at this stage misses the point - you can only start encoding after you have an image, not before. This could only be a post production step (basically what the Steam Link does, I guess). Generally it does not make sense, because you'd get a lower framerate that would have a higher latency. It would also be inferior in quality and consume more power.

Encoding in something that needs less bandwith does make sense if you try to send the signal through a network connection e.g. to your TV, but none at all if you have a screen next to your computer.

Also, where do you get the impression that the input is small? A typical game now-a-days comes in at multiple gigabytes, nothing small there; tons of high res textures that go into the rendering pipeline - if anything the combined output is much smaller than the input.

You're essentially talking about abusing a video compression format for procedural generation - similar to concept that you can make a chatbot by running a text compression algorithm "in reverse".

It's a fun idea. I don't think it makes sense for video games, because video compression formats are designed for, well, video (i.e. 2D moving pictures - the videos that compress best with current tech tend to be animation, with large areas of flat colour that move around). It would be a good fit for moving-pictures animation, something with the kind of motion that 2000s-era Flash animations had (or Inferno Cop for a recent example) - but to be successful with that style you want either beautiful hand-drawn art (like e.g. Child of Light) to make up for the lack of motion, or at a minimum you want clean vector lines. Video formats handle non-predicted parts more like jpeg, so you'd end up with a bunch of blocky and muddy shapes that moved around like a flash animation - not an aesthetic that I imagine appeals.

For a demoscene-style "minimal filesize" way of creating visuals it would make sense, except that the graphics card already has a lot of hardware for rendering 3D scenes from geometric primitives, and those tend to look better. I mean, ultimately the way you use video decompression hardware rather than generating pixels in general-purpose code is that you ship a video file rather than an executable that renders video - and video compression has a long way to go to catch up with procedural generation. I see this pretty directly as I sometimes render and encode videos (from MikuMikuDance), and a few hundred kilobytes of models, textures and motions will inevitably result in a hundreds-of-megabytes encoded video that still looks noticeably worse than the actual rendering.

> Maybe you could do less work producing just the few bytes

It takes more work to produce fewer bytes, because you've to pack the same information in fewer bytes. Entropy and all that.

Not if the information starts out in fewer bytes. Then, you're just rearranging the bits into renderable order. "Chicken" is a very small number of bits that would need to be expanded to more bits (leg down here, wing over there, beak up here) before handing them over to the decoder to make actual feathers. The representation in a game's scene list is overwhelmingly more compact than the input to a decoder would be, and most of it barely changes from one second to the next, as the pixels are all replaced dozens of times over.

The information does not start out in fewer bytes. It starts out in all the bytes that represent the 3D models of entities you're interacting with in your game and the textures to cover them, at least. That's usually more bytes than needed to show a 32 bit 4K picture. The rendering process reduces all the jungle you have in your VRAM to a few kB that are briefly shown on your screen.

It seems like you don't know the basics of 3D graphics nor the basics of video encoding. It is extra work and would save some bandwidth needed to pump the picture to the display, which is not an issue in 3D graphics (we have dedicated cables).

This is exactly what OnLive/Parsec/Playstation Now do. Keep the video card in the cloud, stream to a screen. Works pretty good if you have a fiber connection, pretty hit and miss otherwise.

I thought PlayStation Now streamed content from PS4 to PS Vita. I may be messing up the names tho.

You're thinking of Remote Play. That's what they call their streaming thing from PS4 to PS Vita, though it does support Windows nowadays.

games can be procedurely generated. Videos come from a source video that acts as a standard.

Compression standard finalized in 2018 is more sophisticated than one from 2004? Say it ain’t so.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact