Hacker News new | past | comments | ask | show | jobs | submit login
Linux kernel VP9 codec V4L2 control interface (iu.edu)
86 points by mfilion 8 days ago | hide | past | favorite | 50 comments





Unfortunately userspace (e.g. VAAPI, ffmpeg) support for this is not done. Until VAAPI support is implemented, videos in Firefox will be unaccelerated. I think it is the same deal for Chrome.

HW acceleration on Linux was fixed about a year ago https://9to5linux.com/firefox-81-enters-beta-gpu-acceleratio...

Firefox uses VA-API. That library does not support this hardware.

Edit: as explained below, the linked work is for specific ARM hardware like the rk3399 SoC.


> Until VAAPI support is implemented, videos in Firefox will be unaccelerated. I think it is the same deal for Chrome.

That is an issue of Firefox, other software (mplayer, mpv) support VAAPI for many years. And with youtube-dl integration in mpv, why even play videos in Firefox?


Nope not an issue in Firefox, an issue in VAAPI. Firefox supports VAAPI just fine, VAAPI does not support this hardware/API. Considering it is a new API, I am still holding out hope that support gets added.

https://github.com/noneucat/libva-v4l2-request#branch=fix-ke...

And there's probably some branch somewhere that supports VP9 too.


This solution isn't great, there is direct support for the V4L2 codec API inside ffmpeg. So vaapi is not useful in this case.

Wouldn't the gstreamer support that is mentioned by the path-description directly enable hardware acceleration in Firefox? Or do I misunderstand to what extend Firefox is using gstreamer at the moment?

Firefox does not use gstreamer at all AFAIK.

Ah I confused it with ffmpeg which is also vaapi of course.

Also gstreamer somehow was added for something but that was 7 years ago, I guess getting video decoding running was different story

https://wiki.mozilla.org/index.php?title=Special:Search&limi...


The usefulness of hardware acceleration for video decoding is highly debatable.

1) It's not always much more energy efficient, but it sometimes is, but less than you'd think, GPUs need power too

2) It increases greatly the complexity of client software that has to implement both accelerated and unaccelerated decoding, leading to poorer software quality

3) Drivers quality is usually terrible: lists of working hardware/software combinations have to be maintained and in some cases, holes in sandboxes have to be punched [1]

4) HW support usually lags behind state of the art encoding. Youtube is already using av1, but the vast majority of devices won't support it in hardware before something else comes up

5) Highly optimised decoders, such as dav1d, are extremely effective and save bandwidth and power compared to HW VP9.

EDIT: I'm mostly talking about the desktop/laptop use case here were things are very fragmented. On a mobile phone where manufacturers control hardware and software end to end, that's a different story.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1698778


> The usefulness of hardware acceleration for video decoding is highly debatable.

Disagree. On low-end hardware the advantages are clear. On my older Intel NUC i can play 1080p H.264 (using mpv) hw-accelerated with 15% cpu load, or software decoded with 75% cpu load. In the first case the NUC is silent, in the second case core temperature is rising and eventually its fan starts spinning.


> On my older Intel NUC i can play 1080p H.264 (using mpv) hw-accelerated with 15% cpu load, or software decoded with 75% cpu load

These numbers are meaningless without measuring watt-hours used for the task.

I was able to play 1080p H.264 video with hardware acceleration on a 8800 GS with an Athlon X2 5000 with about the same CPU utilization, back in 2008-2009. There was a special library (shareware) that enabled HW acceleration way before it was commonplace on integrated GPUs. Forgot what it was called, but it was Nvidia/CUDA only.

That was 12+ years ago.

Obviously GPUs have become more efficient since then, but so have the CPUs. It also matters how the video stream was encoded for efficiency. It's entirely possible that under certain options, hardware decoding's advantages are almost entirely negated.


Also there's "levels" of hardware acceleration - using CUDA (or any other shader-level acceleration) will always be less efficient than a dedicated hardware block.

And there's multiple steps in decoding a video - some steps in some codecs may fit different acceleration schemes better, so it may not be worth the hardware cost for a full pipeline decode at some point, but then later transistors are cheaper, or new hw decode techniques discovered, so more steps can be done in dedicated hardware blocks. Also those hardware blocks may have hard limits - if it can only (say) cope with 1080p60 at a certain profile level for a codec, trying to do something more than that will likely just completely skip the HW block - it's hard to do any kind of "hybrid" decode if it's not a whole pipeline step.

"HW Video Decode Acceleration" isn't a simple boolean.


Hybrid decoders that use GPU shaders are somewhat rare; HW decoding pretty much always means "ASIC". And ASIC power draw for decoders is typically in the <1W range.

For dav1d, even YouTube-tier 1080p SW decoding is using +4-5W on my laptop, and 4k60 is +15-20W.


> ASIC power draw for decoders is typically in the <1W range.

Many times even "standalone" HW decoders use or share GPU components (e.g., almost always the memory). Just bumping the memory controller clock up of the GPU already consumes >10W on my system.


it's hard for me to imagine that video decoding would need a significant bandwidth boost like this to run. that seems like either a driver or hardware issue, and one that ought be solveable. 4k60 is 12Gbps. even inflating that number a bunch, it's hard to imagine most discrete graphics card memories needing more than their base-clocks to serve this.

on mobile at least, where graphics are integrated & using main memory, there ought be little/no difference in memory throughput use.

last, some new GPU's like AMD's Navi (RX6xxx) have on-package caches, "Infinity Cache", between i think 64-128MB. i want to think think could be used like Intel's Crystal Well L4 eDRAM, to keep from needing to go to main memory at all. how much if any of a win that is & whether that would even be possible i'm not sure.

i'm somewhat skeptical that there really is a problem here. if there is, i suspect it's somewhat rare & probably a bit of an oversight. i should test though. i would love get a wider picture of what the real impacts of video decoding are.


>HW decoding pretty much always means "ASIC"

Indeed. For example, hardware decoding is the difference between choppy video and smooth video on the PinePhone because the CPU isn't powerful enough and the GPU is useless for decoding.

(And to fguerraz's edit that their comment doesn't apply to mobile phones "where manufacturers control hardware and software end to end", the manufacturer does not control the software on the PinePhone.)


Yes, again, I'm talking about PCs here, where it's usually implemented in shaders.

No it isn't. "NVDEC" is an actual ASIC block in the GPU silicon. It's not "shaders". Same with AMD's VCN. And Intel's QuickSync.

If it was just shaders then there'd be basically no concerns with driver quality or hardware support, just like there aren't with CPU decoders.


To be exact, it depends on the generation of hardware. At leat for Intel and AMD, the first version tend to have more shaders, then they switch to ASICs. Intel actually open sourced the shaders that they use.

So was I? Which phone can even achieve a 20W power draw...

The only hybrid VP9 decoders were AMD's that only supported Windows, which they stopped shipping years ago (any current/Linux AMD drivers that support VP9 decoding only do so via an ASIC), and Intel's that was only supported on 3 generations of GPUs (Gen7.5, Gen8, and Gen9) and is obsoleted with an ASIC in Gen9.5.


But the email is about ARM SoCs with dedicated VPU IP blocks.

>"2) It increases greatly the complexity of client software that has to implement both accelerated and unaccelerated decoding, leading to poorer software quality"

I happen to have my own product having just that - software and hw accelerated decoding. It plays videos in few resolutions and presence of HW acceleration allowed me to play 4K videos (first on the market in my segment) with close to 0% CPU consumption on low end PCs. Competitors at that stage would not even dream about offering 4K content.

As to "poorer software quality" - please do not play FUD. I just looked at the source code - the HW accelerated path (decodes from source to DirectX texture) added miniscule 1200 lines of code good chunk of which are headers / declarations. The software is being used by tens of thousands of clients and I have about zero reports where enabling HW decoding has lead to error.


This is kernel API for VPUs not for GPUs.

Power reduction is not really questionable. You can't really achieve smooth playback at full-res without VPU on devices where these things are used.


"It increases greatly the complexity of client software that has to implement both accelerated and unaccelerated decoding, leading to poorer software quality"

Software without functionality is really simple! Same argument applies to supporting unicode, both text directions, high-dpi scaling, catering to visually impaired or having any sound more complex than midi.


> 2) It increases greatly the complexity of client software that has to implement both accelerated and unaccelerated decoding, leading to poorer software quality

Only if you are not using any abstraction layers. GStreamer should take care of using a hardware decoder if available, otherwise fall back to software decoding.


> The usefulness of hardware acceleration for video decoding is highly debatable.

No it isn't. There's a reason it's used on 99% of consumer devices. Hardware companies are generally not in the business of adding to the BOM cost for no reason. Linux alone is the outlier.

> It's not always much more energy efficient, but it sometimes is, but less than you'd think, GPUs need power too

"As you can see a GPU enabled VLC is 70% more energy efficient than using the CPU!"

https://devblogs.microsoft.com/sustainable-software/vlc-ener...

chrome-hw showing 1/4th the power consumption of chrome-sw on the same video on more recent Apple M1: https://singhkays.com/blog/apple-silicon-m1-video-power-cons...

Also hardware decoders have consistent performance, which is not true of CPU-based decoders. This is especially problematic & obvious at high resolutions. Windows & MacOS ultrabooks can do 4k video all day long without an issue. Linux ultrabooks get noticeably choppy at 1440p and 4k is right out.

This is also why you'll find ultra-low end SoCs regularly prioritizing hardware decoders over faster CPUs, notably those in every smart TV & the majority of TV streaming dongles/sticks/boxes. Which really shouldn't be surprising, fixed-function hardware has always been drastically more efficient than programmable hardware, and video has changed nothing about that.

> 2) It increases greatly the complexity of client software that has to implement both accelerated and unaccelerated decoding, leading to poorer software quality

Sounds like a job for a library, which is how every other OS makes this a non-issue.

> 4) HW support usually lags behind state of the art encoding. Youtube is already using av1, but the vast majority of devices won't support it in hardware before something else comes up

Youtube also still uses VP9 so that power efficiency didn't regress on existing hardware, and mid-tier TV SoCs with AV1 decoder support are already here (such as the Amlogic S905X4). Sony's 2021 BRAVIA XR line also has HW AV1 decoders up to 4k.

> 5) Highly optimised decoders, such as dav1d, are extremely effective and save bandwidth and power compared to HW VP9.

Care to back that up with a source? All I can find is statements that dav1d decoders are fast, but I can't find any evidence they are efficient. The only thing I can find is this: https://visionular.com/en/av1-encoder-optimization-from-the-...

which has dav1d using more power than ffmpeg-h264 but less than openhevc, but those are also software decoders which similar to the above take significantly more power than hardware decoders for the same codecs.


Disagree.

I can run multiple 1080p twitch streams with mpv using streamlink and setting appropriate decoder flags while using chromium to watch even one stream puts a lot of strain on my laptop and gets fan running immediately.

So from my perspective it is very usefull to offload video decoding to gpu and leave cpu cycles for other work. Is it more energy efficient? Never checked that but gpu fan does not really spin any faster and looking at the temperature graphs it does not seem it really strains it.

I tried enabling gpu acceleration for browser (chromium based) and I still don't really know why it is so flaky and unreliable.


AFAIK rk3399 is especial in this area: its codecs need no binary blobs. This means it can encourage other vendors to do the same and get ryf-certified. ARM SBCs based on rk3399 can become the only modern affordable systems with ryf certification.

Now if only the standard release for PineBook Pro would use newer than a 5.7 kernel so we could get hardware codecs.

Might just have to sit down and figure out how to cross compile Gentoo for it.


I mean compiling your own kernel and compiling the OS are very different. I am running 5.14 on my PBP right now.

No support for external displays, though.


You don't need to go full Gentoo to install a custom kernel. From Manjaro you could even pull the linux-mainline AUR package and just build+install that (or linux-git or any of the others), if you want the easy way out.

It was more of a "there are a ton of packages which also need to be patched" kind of thing as well.

Try armbian

I don't think the Allwinner codecs need binary blobs.

Could someone explain the significance of this, why it took so long, and what it opens open?

VP9 is an open source, royalty free video codec. It is developed by Google to provide a free alternative to things like H264 and H265. Implementing codecs for both of those require paying licence fees.

Codecs can be in software, but can also be implemented in hardware which is much more power efficient. This change enables the Linux kernel to use hardware VP9 decoders so that software can decode (play) VP9 video much more efficiently when that hardware is available.


Thank you for the clear explanation.

It looks like it implements hardware acceleration of the VP9 codec for some specific hardware (Rockchip VDEC and Hantro G2). This opens up playing, for example, lots of YouTube videos with less CPU usage on devices with that hardware. I can't comment on whether or not it "took so long" as I have no idea which hardware this is.

The title makes it out to be something fundamental in Linux, but this is just one driver becoming more complete.


It's a new media subsystem userspace API, not just a new driver, and the API will be stable from the get go, instead of languishing in the staging area, like the H.264 one.

Is the h264 one going to move out of staging anytime soon?

I just wonder how does presence of a niche codec in the kernel affect the kernel size and performance.

I would be glad if everybody would use it so it would be mainstream but the reality is H.264 and H.265.


This is a hardware driver that conforms to a standard interface, it doesn't implement a VP9 codec in software in the kernel.

This is a driver interface. If it is not standardized, we get that ugly situation where an userspace app only works with hardware from a specific vendor.

It doesn't have to only work with hardware from a specific vendor. It can unlock specific features (like hardware codec acceleration) when it detects specific hardware presence.

And it doesn't have to be a userspace application, it can be a userspace library - e.g. something like GStreamer.


youtube uses VP9 so I wouldn't call it niche.

Yes, unless you force it to h264 it'll default to vp9, as, well, google.

Many videos are not available in VP9. I have noticed a couple of years ago when I had to use vanilla Ubuntu without "install 3-rd party software" checkbox checked during installation - Firefox refused to play many YouTube videos.

It also supposedly makes sense to force H.264 to increase chances of hardware acceleration being used.


> Many videos are not available in VP9.

That's just not true.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: