Hacker News new | past | comments | ask | show | jobs | submit login

Maybe this is not such a concern to audience of this article, but at least for me I'm stuck with h264 because VP9/AV1 encoding is really really slow. I'd love to use the codecs that are open and technically better, but when the video encodes at 1fps, it's too just too convenient to use the magnitudes faster h264.

I'm probably not up to date with newer/hardware encoders and please let me know if my view is outdated.




The hardware encoders are very fast and generally better than x264 (but not by as much as you'd think with the x264 slow preset).

In addition, there are fast threaded AV1 encoders you may be overlooking, like SVT-AV1. For non-realtime, my favorite is av1an, which also yields better quality than is possible from aomenc and works with pretty much any encoder/codec: https://github.com/master-of-zen/Av1an


Hardware encoders are fine for streaming but not archiving. A CPU encoder can produce a file that is half the size or less for the same quality.


Yes, but AV1 hardware encoding beats x264 software encoding, if thats the only other option (albeit not by much).


x264 opponents are earlier versions of AV1, namely VP8 and VP9.

x265 should be what is compared against AV1 when discussing quality and encode speeds.

We use x264 for compatibility purposes, if your device is intended to play video, it will decode x264. x265 decoders are in a lot of devices at this point, and AV1 is just now starting to see representation.

x264 is like .jpeg and will probably never die.


Yeah y'all are preaching to the choir, but AVC is still the default for... Well, everything, including OP.

It will probably eventually be like mp3, where users are still reflexively encoding to it without a good reason.


Unlike mp3s compared to other audio codecs, h264 has the luxury of pretty much ubiquitous support for hardware decoders and encoders against other video codecs, so I believe it will have an even longer life than mp3.


A nitpick: x264 is an encoder implementation, the format itself is called H.264 (and H.265).


The benchmark should be against x265 though, which is mainstream now


Is it? AFAICT most things are still on h264.


Most things where? Like, the biggest video app on the earth called Youtube uses mostly VP9/AV1 (it will only play H.264 if you don't have any encoding for VP9 OR AV1, which is very, rare).

Netflix also uses H.265/AV1. Amazon, HBO, Disney, etc also use H.265, but I believe only on higher resolutions.

On Apple world even phone pics uses HEVC (HEIC).

I think it's a total mix honestly (Because tiktok/X/Meta uses h264 as far I know).


Most things like the rest of the world - H.264 is really the only option.

More technically correctly, more than half the currently in use devices in the world do not support more than H.264

Like all my Smart TVs, my kitchen iMac, my parents’ phones etc. only my phone and my laptop support HEVC.


They force AV1 and VP9 via software decoding even if your hardware doesn't support it. Youtube been using VP9 for a decade, even without HW support at the time.

Netflix uses AV1 when on mobile data, even if the HW doesn't support as well (for TVs, it need HW decoding).

The only way to not use VP9 on Youtube is if you don't have the codec installed, which is super rare because the codec is open.


Do you have a way to reproduce this?


This comparison is not mine, and its excellent:

https://giannirosato.com/blog/post/nvenc-v-qsv/

x264 used the medium 10 bit preset, which is a bit of an oddball because 10 bit AVC is "unofficial," and many hardware decoders don't support it.


av1an is essentially a "wrapper" around other encoders. I've played around with it lot in the past, but have never seen any real quality gains from using it (as measured by VMAF) instead of just using the encoder directly. Am I doing something wrong? What exactly does av1an buy me (except maybe for better independent threads?)


First some background. Av1an works by splitting videos into scenes/chunks, which it then encodes in parallel. But this also allows it to change encoding settings for each scene.

Specifically, it has a "VMAF target" mode that encodes a few frames from each scene as samples, measures their VMAF, and then boosts or reduces the encoding preset for that individual scene based on the result. It also has a "black boost" feature to allocate more bitrate to dark scenes, which encoders and various metrics tend to misrepresent.

Also, the possibilities with the VapourSynth support are infinite. Some simple examples include denoising a noisy video on the GPU for better quality than the encoders' CPU denoising, or deblocking with custom parameters, or upscaling with some model in PyTorch. And that's just the start:

https://vsdb.top/


There's a feature that allows you to try and target a specific VMAF score, but it doesn't work very well. What av1an does do for you is increase the speed by a lot, especially for aomenc and rav1e, which don't scale well with high thread counts. SVT-AV1 doesn't really have this problem.

A faster encode might allow using a slower speed/preset setting, which would increase quality per bitrate a little bit. But I don't really consider av1an a tool that increases encode quality, to be honest.


AV1 hardware encode just started rolling out with the most recent GPU architectures so the majority of hardware still has to do software encoding. I'd also guess that a very large fraction of the hardware out there doesn't even have hardware decode for AV1 since it was only the last generation of GPUs that got that. AV1 is mainly solving problems for the platform owners (google/netflix/facebook etc) and h264 will probably serve typical users for years to come especially if they have that one device they want to keep using that only supports h264.


In the past (with h265 / h264 at least), hardware encoding always ended up with visibly worse quality (and often even bigger file sizes) compared to a software encoder like x264/x265.

Do you happen to know if that's still the case?

(I guess for use-cases such as live streaming it doesn't matter that much, but for video that ends up in some archive, it's probably less acceptable)


That's usually the case as the hardware encoders tend to make tradeoffs in the direction of lower transistor count / faster frame processing while software encoders have the luxury of going for higher quality.


Yes, a YouTuber named EposVox released a video on AV1 hardware encoding when the first Intel dGPUs with support for it released: https://www.youtube.com/watch?v=ctbTTRoqZsM

Later on in the video, there are some graphs comparing Intel's AV1 encoder to SVT-AV1 at different speed presets. Even one of the faster presets (9) will comfortably stay above AV1 quality according to VMAF, and if you don't need real-time speeds you can lower the preset to get further ahead of the hardware encoder. (BTW: That video is >1 year old now, and SVT-AV1 had some significant updates in the meantime too. So the software side is probably looking better now.)


It's around 5% (maybe 10%?) larger file sizes for same visual quality at the moment. For archival I think that's fine, as storage is cheap, it can still be a problem when you pay for outbound bandwidth to users.


hardware encoding gives up a little quality and filesize, but hardware encoding of AV1 will generally beat software encoding of X264 on all axes.


> I'd also guess that a very large fraction of the hardware out there doesn't even have hardware decode for AV1 since it was only the last generation of GPUs that got that

Don't underestimate dav1d. It's a highly optimized software AV1 decoder:

https://code.videolan.org/videolan/dav1d

On my nine year old system, 1080p60 AV1 video was unwatchable with early releases of dav1d due to too many dropped frames.

Eventually dav1d got enough AVX optimizations to play the same video on the same hardware with zero dropped frames.

It was an impressive demonstration of what can be achieved when software makes the most of the available hardware.


Yeah, it will still be a while before anyone can fully get away from h264, but with Apple adding AV1 decoders to their latest chips, hopefully all the wheels are at least in motion now.


Same here. I'm not willing to pony up money for hardware/compute to encode AV1.

I encode x265 on a $2.5/month VPS w/ 4gb ram during the off hours. AV1 is almost an order of magnitude slower.


That sounds crazy slow. Look into what flags you're using, and choose a setting that reduces CPU in exchange for compressing less.

All the major codecs have flags for you to balance speed against quality and compression, and it's up to you to pick the right tradeoffs for your use case.

And for most purposes, you want to use software encoders because they're much more flexible in terms of flags/options than hardware encoders. (Hardware encoders are usually optimized for speed rather than quality -- they're for live capture more then for video conversion.)


How are you getting 1fps encoding? I see 332fps on my 12900K with preset 10.

    -vf scale=1280:720 -c:v libsvtav1 -crf 30 -preset 7 -c:a libopus -b:a 96k -ac 2


Preset 10 is pretty fast, but you're disabling a lot of the features that allow better quality and compression ratios (see readme on the svt-av1 repo). For archiving 3 or 4 is probably good.


1. Compile this: https://gitlab.com/AOMediaCodec/SVT-AV1

2. ffmpeg -i infile.mp4 -map 0:v:0 -pix_fmt yuv420p10le -f yuv4mpegpipe -strict -1 - | SvtAv1EncApp -i stdin --preset 6 --keyint 240 --input-depth 10 --crf 30 --rc 0 --passes 1 --film-grain 0 -b infile.ivf

3. ffmpeg -i infile.ivf -i infile.mp4 -map 0:v -map 1:a:0 -c copy outfile.mp4


Tricks of the trade: why have one computer compress the file when you can split it up into logical segments and have each segment sent to its own encoder?


Back when I still cared about saving disk space, I made a cluster of NVidia Jetson Nanos running in a docker swarm configuration [1] to compress my blu-ray rips, but honestly even when you have six computers working at once, H264 on a single computer is still often faster.

On the Jetson Nanos I was lucky to get maybe 1fps in ffmpeg using VP9. Multiply that by six boards and that's about 6fps in total; ffmpeg running x264 in software mode was getting around 11fps on a single board, not even counting using the onboard encoder chip, meaning that I was getting better performance from one board using x264 than all six using VP9.

Now obviously this is a single anecdote on specific hardware, so I'm not saying that this applies to every single case, but it's a big reason why I personally have not used VP9 for anything substantial yet.

[1] https://gitlab.com/tombert/distributed-transcode


h.264 is from the 90s, so of course it's fast after ~30 years of use. hell, when I first got into encoding, we had dedicated expansion cards to do MPEG-1/MPEG-2 encoding because it was so difficult at the time. New codecs always take time in the beginning while the encoding software is tweaked/optimized. Eventually, it becomes part of the CPU hardware and then we all make comments like "remember when ____ was so slow?" one day, you'll regale the young whiper snappers on internet forums about how painfully slow AV1 encodes were when they start complaining how newHotnessEncoder5000 is so slow.


Oh definitely, no argument here, I'm 100% ok with AV1 becoming the standard "video codec to rule them all", but I'm saying that in the short term, it's difficult to recommend AV1 or VP9 over h264 (at least for personal use). H264 encodes 10x faster, still gives reasonably decent compression, is supported by basically every consumer device [1] and browser out of the box, and very soon will have all the patents for it expired meaning that it will be truly royalty-free. x264 in particular is extremely nice in my experience, doing a lot to really squeeze out a lot of quality in a relatively small amount of space.

That said, AV1 is very obviously the future, and I'm perfectly happy with it taking over the market from h264, and I think that due to the bandwidth savings it's only a matter of time before all the major video services make it the default, especially as the speed of encoders increases to a useable level, which I'm sure it will soon enough.

[1] I know the most recent Raspberry Pi doesn't have a decoder chip for h264, but I think it's fast enough to do it in software.


Raspberry Pi have had hardware decoder for h264 for as long as they've existed (I think?), but dropped in the most recent version. I don't understand why.

They've recently contributed non-trivial patches to Firefox to use the embedded Linux API for video hardware acceleration (V4L2, vs. VAAPI on desktop that we also support), and are shipping the h264ify extension with their Firefox build to get that codec often for their users so that the experience is good on older devices.

Maybe the 5 is that much faster than it's not needed as much, but h264 represent so much content that it feels a bit surprising anyways.

But I'm just a software person, hardware is complicated differently.


> h.264 is from the 90s, so of course its fast after ~30 years of use.

If only! Then the patents would have expired. But H.264 is newer than MPEG-4 Part 2.

But you're right: H.264 has had the advantage of time, to gain fast hardware support.


Ok. You are allowed to think that but stop forcing AV1 down my throat, because you think CPUs will be both cheap and fast in the far away future, because as far I am aware, I exist in the present moment until that future arrives.


>h.264 is from the 90s

I am pretty sure you are thinking of H.263 if it was from the 90s. H.264 barely started in the 00s.


This is done particularly when you are implementing adaptive bitrates (the thing that Netflix uses where it automatically sends you a higher or lower quality picture depending on your Internet connection).

In adaptive bitrate world, you split a video up into fragments, say 2-10 seconds large, and encode each segment in multiple bitrates, so that every say 5 seconds the video player can make a decision to download a different quality for the next 5 seconds.

Ok, but why not split the file up for standard encoding? Well, you can't just concatenate two .mp4 together without re-encoding and have it make sense to most media players (as far as I am aware), and moreover, it's inefficient from a RAM perspective when doing that. 1 second of RAW uncompressed 4k (24 fps) video is about 600MB. Source content for a single episode/movie at Netflix (I don't work there, just something I read once) can reach into the terabytes easily.


> Ok, but why not split the file up for standard encoding? Well, you can't just concatenate two .mp4 together without re-encoding and have it make sense to most media players (as far as I am aware)

You can't just literally `cat foo-[123].mp4 > foo.mp4` with old-school non-fragmented .mp4 files, but you just have to shuffle the container stuff around a bit. You don't need to re-encode.

One downside is if you decide ahead of time that you're going to divide the video into fixed 5-second fragments/segments/chunks to encode independently, you're going to end up with that-length closed GOPs that don't match scene transitions or the like. IDR frame every 5 seconds. So no B/P frames that reference stuff 10 seconds ago, no periodic incremental refresh, nothing fancy.


okay, i don't agree with anything in your reply. segmenting a video file for HLS/DASH delivery is not at all the same thing I'm suggesting. Just for the sake of round numbers, i'm saying to take a 90 minute feature into nine 10-minute segments. fire up 10 separate instances to encode each 10-minute segment. you've just increased the encode time 10x. also, DASH/HLS does not require segmented files. you can have a single contiguous file like an fmp4 as a DASH/HLS source.

>Ok, but why not split the file up for standard encoding?<snip>

at this point, you would be better served by just writing an elementary stream rather than a muxed mp4 file since it's just a segment anyways so why waste the cycles on muxing? you then absolutely 100% can concat those streams (even if you did mux them into a container). if you think you can't, you clearly have not tried very hard.

>I don't work there, just something I read once

I don't work there either, but do have 30+ years of experience with this subject. Sadly, you're not as well informed as you might think. People don't tend to encode to AV1 from RAW. They instead are dealing with a deliverable file most typically a ProRes in today's world after the post process has been completed. No where near terabytes for a feature. More closely to a couple hundred gigabytes for UHD HDR content. You seem to be unnecessarily exaggerating.

Edit: it's a 10x increase in encode speed, not time. that would be opposite effect.


Why did the encode time increase by 10x in that instance? Can't you just seek in the video to the I-frame before the cut point and start your encode there?

I've never tried merging streams across computers so was naively just thinking that your output from each computer would be an MP4 but that makes sense.

I pulled that info. from a Netflix talk, perhaps video cameras back from when that talk occured didn't compress the video for you? Besides, isn't IMAX all intra-encoded? It was my understanding that IMAX films are actually just a series of J2K images, so I would imagine that the video cameras used there would also be intra-encoded.


s/increase/decrease/

i was thinking increased the encode speed 9x, but typed increased encode time. i also swapped the number of segments by segment duration. 9 segments of 10mins = 9x increase in performance.

Sounds like you are confusing Netflix' recommended formats for acquisition vs delivery. Cameras capture RAW formats (rarely is it uncompressed though), and the post houses use that as sources. The post house/color correction will the create the delivery formats which is typically ProRes. RAW is not a friendly format for distribution in the slightest. The full workflow from camera original to what ends up being streamed to the end viewer changes formats multiple times through the process.


Gotcha, that makes much more sense


For certain containers / codecs you can concatenate files without re-encoding. Do it quite often with ffmpef using -c:copy and it's basically at the speed of the disk.


The parent comment I was responding to was discussing splitting encodes across multiple computers then re-combining which is what I was referring to. Still sounds like it is possible and I was incorrect.


Because the most common situation where a person would be encoding video is when they are live capturing it from a camera. If you are making a home security camera, recording videos, etc you need an encoder that can keep up.


yep, that's what bitmovin does


I'm pretty sure that's what everyone does after the first time they try a test encode and see the dismal speeds. It's a trick as old as time. The trick is to make that segment decision better than something like the YT algo that decides where to place an ad break.


My understanding is there is some support for GPU-accelerated AV1 encoding in top-of-the-line latest-gen discrete GPUs that can be used for live streaming with OBS Studio 29.1 or higher. Helpful if you are in a situation where you have extra GPU and are tight on bandwidth.

In my opinion it's still in the early-adopter phase though, and it's perfectly valid to use tried-and-true codecs for user-interactive rendering and encoding cases, or where the existing codec meets your requirements for the compute vs disk/bandwidth trade-off.


In terms of software encoding that is still the case. You are basically doing a trade-off between, Bit-Rate / File Size and Encoding Time.

We could have EVC which provide same H.264 ( or x264 ) encoding time but 30%+ Bit-Rate reduction. But market seems to want maximum BitRate reduction regardless of computational complexity.


I care mainly about archiving, so 1FPS encodes for 24fps content is within the range that I consider okay.

However, I ran into issues with decoder speed too; whatever codec Kodi was using 18 months ago struggled to decode high-bitrate 1080p AV1 video last time I tried. Maybe this weekend I'll try again on the current version of Kodi.


Try encoding with a "low-latency" or a "fast-decode" parameter, and see if that is acceptable to you. Keep in mind not all AV1 encoders are created equal.

If you're on Windows I recommend using StaxRip for encoding.


I'm on Linux, currently using ffmpeg for encoding. I'll look for options that might improve decodability, but I seem to recall I had to go pretty aggressive on both of the two av1 encoders I tried to get any savings vs H.265


It depends on what bitrate you are targeting, and the source content.

AV1 does not outperform H265 at high bitrates (and in certain cases, medium bitrates). What is considered "high bitrate" is dependent on source content, but a good rule of thumb is 40 MBit (think BluRay quality) or more for 4k content almost always goes to H265.

For AV1, depending on what you are encoding, and how much extra time you want to dedicate to experimentation, take a look at grain synthesis (you will want to test decode capabilities on that one).


Haven't tested it yet. I'm waiting for OBS to enable hardware accelerated AV1 encoding on Linux.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: