
Dav1d: performance and completion of the first release - rbultje
http://www.jbkempf.com/blog/post/2018/dav1d-toward-the-first-release
======
twotwotwo
It's super neat to see desktop-class machines should be able to play 1080p AV1
fine with zero hardware support.

I think the lack of mention of GPUs in the post means the answer will be "no",
but is this an area where open-source folks could realistically someday lean
on the GPU for any help with decoding at all?

I see mentions of CPU/GPU "hybrid decoding" from GPU vendors, but can imagine
that might only be something realistically possible with the lower-level
access to the GPU the vendor's own driver team has, not via the documented
shader languages and APIs.

~~~
CyberDildonics
Why would it take 'low level' GPU access to accelerate video decoding? OpenGL
has had compute buffers for years now.

~~~
twotwotwo
The motivating observation here is that I know of a few GPU vendors offering
hybrid decoding for HEVC and VP9, but no hybrid decoders put together by the
open-source community. (Counterexamples are interesting!)

Reasons a GPU vendor might be better able to do this sort of thing than an
outsider who can sling OpenGL include: 1) some hybrid decoders are described
as leaning partly on special-purpose video decoding hardware, which tends to
be a black box to us, and 2) more-detailed understanding of and access to the
details of the hardware might let you efficiently express something that's
inefficient or awkward in just GLSL--in other words, same kind of reason
people care about Metal/Vulkan vs. OpenGL or asm vs. C.

(The further down in the weeds I get the less sure I am of precise technical
correctness, but a couple concrete things that seem to make shaderizing
decoding tricky are: 1) AV1 has a ton of control-flow-y elements--blocks can
be split many different ways and be different sizes, and there are lots of
prediction modes--and branchy code can be bad for shader efficiency, and 2)
some things seem to block parallelism, e.g. for intra prediction you need the
blocks you're predicting from before you can do predictions for the next
block. And given the CPU-GPU transfer latency you can't ping-pong back and
forth at will; you need large chunks that run well strictly on the GPU. Could
be that pieces like the transforms and post-filtering that can be cleanly
separated into GPU steps, though.)

An efficient open-source AV1 decoder based just on OpenGL/GLSL would be great!
But since it wasn't mentioned as an ambition in the post, community-written
hybrid decoders seem rare, and we had an expert about AV1 decoders in the
thread, it did not seem unreasonable to me to ask how realistic it was.

Though if you manage to write an open-source OpenGL-accelerated AV1 decoder,
that would definitely answer my question and leave everyone happy. :)

------
jbk
I'm the author, so if you need anything, just ask.

~~~
ComputerGuru
Realistically speaking comparing a hevc (x265) run and a dav1d run producing a
video of similar quality but ~20% smaller, what is the difference in encoding
time?

~~~
ktta
You'll want to check out rav1e (
[https://github.com/xiph/rav1e](https://github.com/xiph/rav1e))

Here's a comment that gives a clue -
[https://news.ycombinator.com/item?id=17539791](https://news.ycombinator.com/item?id=17539791)

------
cornstalks
Congrats to everyone on the progress, and a huge thanks from me to all the
devs who are working on this! Are there any performance comparisons with dav1d
(AV1) vs ffvp9 (VP9)? I’m curious how expensive decoding AV1 is compared to
VP9 (in software) (and I’m hoping someone else has already done the
benchmarking so I won’t have to).

~~~
jbk
It is a bit more expensive, but not much, for the same quality (aka less
bitrate). For same bitrate, it's 25%/30 more expensive.

No actual measure, just feeling from what we've seen.

------
BlackLotus89
> Therefore, the VideoLAN, VLC and FFmpeg communities have started to work on
> a new decoder

Is there a need to seperate VideoLAN and VLC?

Anyway nice progress, didn't expect such good results so soon. My main
question right now is what the slowest system is on which AV1 is still
playable. I know that older CPU and ARM optimizations are on the horizon (On
the other platforms, SSE and ARM assembly will follow very quickly, and we're
already as fast on ARMv8.), but I'm curious if my raspberry pi/odroid will
ever be able to play 1080p AV1 Videos.

~~~
jbk
> Is there a need to seperate VideoLAN and VLC?

Yes, the community are not joint. VideoLAN has numerous people not working on
VLC.

> Raspberry pi/odroid will ever be able to play 1080p AV1 Videos.

rPi? no. Recent o-Droid, yes.

~~~
naikrovek
> VideoLAN has numerous people not working on VLC.

Whoa, what? What else is going on? Oh, x264 & x265, I bet.

~~~
buovjaga
[https://www.videolan.org/projects/](https://www.videolan.org/projects/)

------
polskibus
Is this written in Rust? If so, did any particular Rust features help a lot in
this achievement, in comparison to writing the code in C or C++?

~~~
nindalf
You're thinking of rav1e, which bills itself as "The fastest and safest AV1
encoder"

[https://github.com/xiph/rav1e](https://github.com/xiph/rav1e)

~~~
dralley
Being a decoder, they probably place a high priority on having the widest
possible platform support. C is still top dog in that respect.

~~~
w-m
I'm curious, what kind of platforms do you have in mind, that

a) can be targeted by C, but not by Rust

b) provide enough performance to make porting a next-gen video decoder a
worthwhile exercise?

~~~
ncmncm
Thousands of special-purpose, minimally-featured, embedded systems. You don't
notice them because they are invisible, and they are invisible because they
"just work". For high-enough volume products they have a decoder chip or
section of a gate array, but most are low-volume and can barely afford the ROM
for the code.

