Edit: as explained below, the linked work is for specific ARM hardware like the rk3399 SoC.
That is an issue of Firefox, other software (mplayer, mpv) support VAAPI for many years. And with youtube-dl integration in mpv, why even play videos in Firefox?
And there's probably some branch somewhere that supports VP9 too.
Also gstreamer somehow was added for something but that was 7 years ago, I guess getting video decoding running was different story
1) It's not always much more energy efficient, but it sometimes is, but less than you'd think, GPUs need power too
2) It increases greatly the complexity of client software that has to implement both accelerated and unaccelerated decoding, leading to poorer software quality
3) Drivers quality is usually terrible: lists of working hardware/software combinations have to be maintained and in some cases, holes in sandboxes have to be punched 
4) HW support usually lags behind state of the art encoding. Youtube is already using av1, but the vast majority of devices won't support it in hardware before something else comes up
5) Highly optimised decoders, such as dav1d, are extremely effective and save bandwidth and power compared to HW VP9.
EDIT: I'm mostly talking about the desktop/laptop use case here were things are very fragmented. On a mobile phone where manufacturers control hardware and software end to end, that's a different story.
Disagree. On low-end hardware the advantages are clear. On my older Intel NUC i can play 1080p H.264 (using mpv) hw-accelerated with 15% cpu load, or software decoded with 75% cpu load. In the first case the NUC is silent, in the second case core temperature is rising and eventually its fan starts spinning.
These numbers are meaningless without measuring watt-hours used for the task.
I was able to play 1080p H.264 video with hardware acceleration on a 8800 GS with an Athlon X2 5000 with about the same CPU utilization, back in 2008-2009. There was a special library (shareware) that enabled HW acceleration way before it was commonplace on integrated GPUs. Forgot what it was called, but it was Nvidia/CUDA only.
That was 12+ years ago.
Obviously GPUs have become more efficient since then, but so have the CPUs. It also matters how the video stream was encoded for efficiency. It's entirely possible that under certain options, hardware decoding's advantages are almost entirely negated.
And there's multiple steps in decoding a video - some steps in some codecs may fit different acceleration schemes better, so it may not be worth the hardware cost for a full pipeline decode at some point, but then later transistors are cheaper, or new hw decode techniques discovered, so more steps can be done in dedicated hardware blocks. Also those hardware blocks may have hard limits - if it can only (say) cope with 1080p60 at a certain profile level for a codec, trying to do something more than that will likely just completely skip the HW block - it's hard to do any kind of "hybrid" decode if it's not a whole pipeline step.
"HW Video Decode Acceleration" isn't a simple boolean.
For dav1d, even YouTube-tier 1080p SW decoding is using +4-5W on my laptop, and 4k60 is +15-20W.
Many times even "standalone" HW decoders use or share GPU components (e.g., almost always the memory). Just bumping the memory controller clock up of the GPU already consumes >10W on my system.
on mobile at least, where graphics are integrated & using main memory, there ought be little/no difference in memory throughput use.
last, some new GPU's like AMD's Navi (RX6xxx) have on-package caches, "Infinity Cache", between i think 64-128MB. i want to think think could be used like Intel's Crystal Well L4 eDRAM, to keep from needing to go to main memory at all. how much if any of a win that is & whether that would even be possible i'm not sure.
i'm somewhat skeptical that there really is a problem here. if there is, i suspect it's somewhat rare & probably a bit of an oversight. i should test though. i would love get a wider picture of what the real impacts of video decoding are.
Indeed. For example, hardware decoding is the difference between choppy video and smooth video on the PinePhone because the CPU isn't powerful enough and the GPU is useless for decoding.
(And to fguerraz's edit that their comment doesn't apply to mobile phones "where manufacturers control hardware and software end to end", the manufacturer does not control the software on the PinePhone.)
If it was just shaders then there'd be basically no concerns with driver quality or hardware support, just like there aren't with CPU decoders.
The only hybrid VP9 decoders were AMD's that only supported Windows, which they stopped shipping years ago (any current/Linux AMD drivers that support VP9 decoding only do so via an ASIC), and Intel's that was only supported on 3 generations of GPUs (Gen7.5, Gen8, and Gen9) and is obsoleted with an ASIC in Gen9.5.
I happen to have my own product having just that - software and hw accelerated decoding. It plays videos in few resolutions and presence of HW acceleration allowed me to play 4K videos (first on the market in my segment) with close to 0% CPU consumption on low end PCs. Competitors at that stage would not even dream about offering 4K content.
As to "poorer software quality" - please do not play FUD. I just looked at the source code - the HW accelerated path (decodes from source to DirectX texture) added miniscule 1200 lines of code good chunk of which are headers / declarations. The software is being used by tens of thousands of clients and I have about zero reports where enabling HW decoding has lead to error.
Power reduction is not really questionable. You can't really achieve smooth playback at full-res without VPU on devices where these things are used.
Software without functionality is really simple!
Same argument applies to supporting unicode, both text directions, high-dpi scaling, catering to visually impaired or having any sound more complex than midi.
Only if you are not using any abstraction layers. GStreamer should take care of using a hardware decoder if available, otherwise fall back to software decoding.
No it isn't. There's a reason it's used on 99% of consumer devices. Hardware companies are generally not in the business of adding to the BOM cost for no reason. Linux alone is the outlier.
> It's not always much more energy efficient, but it sometimes is, but less than you'd think, GPUs need power too
"As you can see a GPU enabled VLC is 70% more energy efficient than using the CPU!"
chrome-hw showing 1/4th the power consumption of chrome-sw on the same video on more recent Apple M1: https://singhkays.com/blog/apple-silicon-m1-video-power-cons...
Also hardware decoders have consistent performance, which is not true of CPU-based decoders. This is especially problematic & obvious at high resolutions. Windows & MacOS ultrabooks can do 4k video all day long without an issue. Linux ultrabooks get noticeably choppy at 1440p and 4k is right out.
This is also why you'll find ultra-low end SoCs regularly prioritizing hardware decoders over faster CPUs, notably those in every smart TV & the majority of TV streaming dongles/sticks/boxes. Which really shouldn't be surprising, fixed-function hardware has always been drastically more efficient than programmable hardware, and video has changed nothing about that.
> 2) It increases greatly the complexity of client software that has to implement both accelerated and unaccelerated decoding, leading to poorer software quality
Sounds like a job for a library, which is how every other OS makes this a non-issue.
> 4) HW support usually lags behind state of the art encoding. Youtube is already using av1, but the vast majority of devices won't support it in hardware before something else comes up
Youtube also still uses VP9 so that power efficiency didn't regress on existing hardware, and mid-tier TV SoCs with AV1 decoder support are already here (such as the Amlogic S905X4). Sony's 2021 BRAVIA XR line also has HW AV1 decoders up to 4k.
> 5) Highly optimised decoders, such as dav1d, are extremely effective and save bandwidth and power compared to HW VP9.
Care to back that up with a source? All I can find is statements that dav1d decoders are fast, but I can't find any evidence they are efficient. The only thing I can find is this: https://visionular.com/en/av1-encoder-optimization-from-the-...
which has dav1d using more power than ffmpeg-h264 but less than openhevc, but those are also software decoders which similar to the above take significantly more power than hardware decoders for the same codecs.
I can run multiple 1080p twitch streams with mpv using streamlink and setting appropriate decoder flags while using chromium to watch even one stream puts a lot of strain on my laptop and gets fan running immediately.
So from my perspective it is very usefull to offload video decoding to gpu and leave cpu cycles for other work. Is it more energy efficient? Never checked that but gpu fan does not really spin any faster and looking at the temperature graphs it does not seem it really strains it.
I tried enabling gpu acceleration for browser (chromium based) and I still don't really know why it is so flaky and unreliable.
Might just have to sit down and figure out how to cross compile Gentoo for it.
No support for external displays, though.
Codecs can be in software, but can also be implemented in hardware which is much more power efficient. This change enables the Linux kernel to use hardware VP9 decoders so that software can decode (play) VP9 video much more efficiently when that hardware is available.
The title makes it out to be something fundamental in Linux, but this is just one driver becoming more complete.
I would be glad if everybody would use it so it would be mainstream but the reality is H.264 and H.265.
And it doesn't have to be a userspace application, it can be a userspace library - e.g. something like GStreamer.
It also supposedly makes sense to force H.264 to increase chances of hardware acceleration being used.
That's just not true.