Rav1e: An experimental AV1 video encoder, designed to be fast and safe

kristofferR · on July 16, 2018

What does "safest" mean in this context?

jabl · on July 16, 2018

Presumably that it's implemented in Rust, and thus does not suffer from many of the usual bugs that C code bases suffer from?

There seems to be some asm code as well, which obviously does not enjoy the safety advantages of Rust.

arez · on July 16, 2018

I always wondered if in practice it really checks out, is rust code really safer? Did the bugs just shift to different ones? Is there anyone who wrote about that already?

masklinn · on July 16, 2018

It's probably not quite equivalent but I believe Frederico & al found & fixed a number of issues when they converted librsvg to rust. You may want to check the archives on their blogs (IIRC it's split between Frederico's own and the librsvg one) for more, or specifics.

But "safe" rust at least intrinsically protects from use-after-free, double free, dangling pointers, null-pointer dereferences, out-of-bounds accesses, … You may still have logic bugs of course (though the richer type system expressivity also allows better static encoding of application & domain logic), but these baseline memory-safety issues will only be a problem in specific and tagged `unsafe` blocks rather than throughout the application.

kevincox · on July 16, 2018

While I do believe that rust will prevent many memory safety issues. I think you would probably catch a bunch of safety issues rewriting any code.

ryanmarsh · on July 16, 2018

I think you would probably catch a bunch of safety issues rewriting any code.

As well as introduce new ones.

derefr · on July 17, 2018

That's only if you rewrite "from the design" in a fresh repo.

If, on the other hand, you rewrite the code line by line from C to Rust (which Rust is actually quite amenable to!), faithfully translating the semantics of the C code into the Rust code (and thereby having to use lots of unsafe{} blocks) then you can avoid most of the problems—what you'll end up with will essentially be the same as what a hypothetical C-to-Rust transpiler would output.

Importantly, you can also translate the test suite in this same way.

After that step is done, you can just refactor the resulting code, replacing unsafe{} blocks with Rust idioms, and then rerunning the tests (which you can leave un-idiomatized) as regression tests.

kevincox · on July 16, 2018

Of course. I would suspect that with Rust you would introduce less new ones but I just wanted to point out that finding bugs isn't evidence that the language helped you out.

Even though I am a believer I would love to see some solid evidence of Rust making code safer.

brian-armstrong · on July 16, 2018

All of the pointer issues you mentioned have already been solved in C++ by unique_ptr.

pjmlp · on July 16, 2018

Good, now try to enforce developers to actually use it on their code and all third party libraries they link to.

Ah, and not passing it around by address or reference, instead of actually moving ownership.

C++17 is a great improvement, but for it to work out in this context, developers need to actually write C++ instead of "C with C++ compiler".

brian-armstrong · on July 16, 2018

This is a silly argument. When evaluating whether to use a tool, you should consider how /you/ would use the tool, not how someone else does.

pjmlp · on July 17, 2018

Yeah, it kind of works out in the ideal world where one works alone, writing 100% of the source code.

stouset · on July 16, 2018

Right up until the moment you have third-party dependencies.

dbaupp · on July 16, 2018

Pretty much only use-after-free and double-free of that list is truly solved by unique_ptr, which is great, but nothing like what you say:

- std::move out of a unique_ptr x and "*x" is a null pointer dereference,

- take a reference into a unique_ptr, and it becomes dangling if it is held after the pointer is deallocated,

- an T[N] array doesn't get bounds-checked whether or not it is stored in a std::unique_ptr (it is nice that it reduces the number of raw pointers flying around, which likely does reduce the number of out-of-bounds accesses, but it doesn't solve them)

brian-armstrong · on July 16, 2018

Clang does a good number of warnings/errors on unique_ptr, and I'm not sure your first point is actually right - it tends to be pretty hard to use after move.

If you want to pay the cost of runtime array checking every time, an impl in operator[] is just a few lines long. Thankfully it is becoming more common to run programs under ASAN in dev mode.

dbaupp · on July 17, 2018

It is trivial to use-after-move. The following compiles completely without warnings with the clang on my system (even with -Wall -Weverything), and segfaults:

  #include <memory>
  
  int main() {
    std::unique_ptr<int> p = std::make_unique<int>(0);
  
    std::unique_ptr<int> q(std::move(p));
  
    return *p;
  }

Maybe you mean it doesn't happen much in practice, which might be true (although, we need to be comparing use-after-move with use-after-free etc., which also don't happen that much, per line-of-code), but is a different point. The fact remains that unique_ptr doesn't solve use-after-move.

> If you want to pay the cost of runtime array checking every time, an impl in operator[] is just a few lines long

This is also a different point, and moving the goal posts.

In any case, which operator[] exactly? AFAIK, neither T[N] nor T* can have a custom operator[] (and getting the bounds of a raw pointer is essentially impossible), and none of the operator[]s of std::array nor std::vector nor std::span do bounds checking (sure, you can use ...::at for the first two, but you have to remember to do that everywhere, going against the default that pretty much every programming language uses).

> Thankfully it is becoming more common to run programs under ASAN in dev mode.

Yes, this is great! However, this is yet another different point, and it is orthogonal to modern C++ and its fancy new features like std::unique_ptr: C++98 code, and even C code benefit from ASan.

pcwalton · on July 17, 2018

> I'm not sure your first point is actually right - it tends to be pretty hard to use after move.

    std::unique_ptr<int> foo(new int(10));
    
    void bar(int& x);
    
    int main() {
        bar(*foo);
        return 0;
    }
    
    void bar(int& x) {
        foo = std::unique_ptr<int>(new int(20));
        std::cout << x; // use after free
    }

No warnings on any warning level.

pcwalton · on July 17, 2018

unique_ptr helps with one thing and one thing only: memory leaks.

The list you were responding to didn't include memory leaks.

derf_ · on July 16, 2018

The "safest" comment is mostly tongue-in-cheek, but the advantages are real. I enjoyed this comment from someone contributing to the project for the first time:

< barrbrain> atomnuker: I felt good when my patch was done and I had exactly what I set out to write.

< barrbrain> I didn't enjoy the compiler forcing me to fix all my bugs upfront.

There are also advantages other than safety to using a modern systems language that has learned something from the last 30 years of programming language design but doesn't come with all of the baggage of C++.

manmal · on July 16, 2018

Having coded full-time in Swift for 2 years now, I can attest that strongly-typed languages really do a lot for code quality and especially reliability. Sure, you still have semantic bugs all over (that's what tests are for), but whole classes of bugs (null-pointer refs, dangling pointers) are at least greatly reduced. As long as I stay away from the no-nos (like implicitly unwrapped optionals) and use the goodies (immutable models; enums with attached values; generics), my apps hardly ever crash now, once out in the store.

coldtea · on July 16, 2018

>I always wondered if in practice it really checks out, is rust code really safer? Did the bugs just shift to different ones?

Bugs don't magically "shift". If you eliminate a class of bugs, it's gone (e.g. memory bugs).

Whether you can still have other bugs (e.g. logic bugs) that's irrelevant, you could still have those in C as well.

davrosthedalek · on July 16, 2018

Or, the way you eliminated one class opens up or makes more likely a different class. Maybe avoiding pointers makes it more likely to introduce logic bugs. It could also make some algorithm too slow, which one could call a performance bug.

I don't think it's particular likely, but there are always trade-offs.

pjmlp · on July 16, 2018

As proven by Ada, Modula-2 and a few other systems languages, you don't need to scatter pointer arithmetic all over your code to achieve the same machine language output.

tzahola · on July 16, 2018

>Bugs don't magically "shift".

It depends... You can’t have dangling pointers in Java - instead, you can “leak” memory by holding onto your objects even when they’re not needed anymore.

elteto · on July 16, 2018

I don't think you are using "instead" correctly in that sentence.

You are implying that not being able to have dangling pointers in Java (and thus eliminating an entire category of bugs) inevitably leads to the new category of bugs of leaking memory by holding on to references. But that is the case in any language that allows dynamic memory allocation. You can leak memory in Rust, C++ and VB if you don't dealloc.

Dylan16807 · on July 16, 2018

A program with more memory controlled by GC is generally going to have more leaks like that.

The switch reduces bugs some, and definitely reduces the severity, but I would largely put the bugs in the same category and say it hasn't been eliminated.

coldtea · on July 16, 2018

Part of my point would be that you can have that in C as well.

pjmlp · on July 16, 2018

Yes, but not only Rust, rather any memory safe system programming language all the way back to ESPOL in 1961.

In those languages you have "Logic errors" to debug, in C and its direct descendants you have "Logic errors" + "UB errors" + "Memory corruption errors" to debug.

Also having the use of unsafe explicit in the type system, means that it is relatively easy to track down where such issues might occur, whereas in C every line of code can be a source of problems, depending on which compiler and flags are being used.

Again, this applies to all memory safe system programming language, not just Rust.

EDIT: Fixed the NEWP reference, as ESPOL came first and NEWP only replaced it in 1976.

coldtea · on July 16, 2018

>Yes, but not only Rust, rather any memory safe system programming language all the way back to ESPOL in 1961.

Yes, but now we're discussing only those that practically matter to more than 5 people today.

pjmlp · on July 16, 2018

I bet Unisys MCP deployments still matter to more than 5 people.

Plus the point was to talk about safe systems programming languages in general.

Or should we ignore history just because UNIX won on the server?

tzahola · on July 16, 2018

What’s this “UNIX”? A Linux distribution?

pjmlp · on July 16, 2018

I know you are trolling, but there you go.

https://en.wikipedia.org/wiki/History_of_Unix

https://en.wikipedia.org/wiki/Unix

https://en.wikipedia.org/wiki/POSIX

http://www.opengroup.org/standards/unix

coldtea · on July 16, 2018

His point is probably UNIX didn't won -- Linux did.

pjmlp · on July 16, 2018

Linux is non-certified copy of UNIX, without POSIX it wouldn't matter in the marketplace, ergo UNIX won.

vetinari · on July 16, 2018

As you can see in the wild, nobody really cares about UNIX certification. Ergo, UNIX didn't win.

pjmlp · on July 16, 2018

The fact is that the Linux kernel without the APIs copied from POSIX/UNIX and the userspace copied from UNIX is useless on the server space.

Of course we can play semantic games about the use of UNIX word, and GNU/Linux not sharing any code from either BSD or AT&T linage, it doesn't make it any less UNIX.

Had Linux not copied UNIX, and provided a playground for UNIX vendors kind of outsource their development costs, it would have turned out to yet another hobby OS.

vetinari · on July 16, 2018

NT had POSIX subsystem too.

What Linux (and NT POSIX to certain degree) allowed, was to move the existing investment in software from hodge-podge of mutually incompatible, but expensive UNIX systems, to somewhere else.

pjmlp · on July 17, 2018

NT isn't a copy of UNIX, just because it had support for the first edition of POSIX, it goes beyond than that.

Starting by wanting to copy Minix, an educational OS that copied UNIX, getting the GNU tools on board (which goal was to copy UNIX), kernel architecture, device drivers subsystems, userland, culture, ....

My old stack of Linux Journal editions at home are pretty clear what the ongoing story was.

If NT was a copy of anything, it was VMS given Dave Cutler's contribution to its design.

AndrewDucker · on July 16, 2018

"Written in Rust"

_8huj · on July 16, 2018

I'm trying to get my insurance discounted on my car. I should slap a "written in Rust" sticker on it. Safe driver discount?

pleasecalllater · on July 16, 2018

It makes sense. If your car is covered with rust, you should be almost sure that no one will even try to steal it. :)

Crespyl · on July 16, 2018

Given that automotive control software is apparently a huge mess right now, and a rapidly growing attack surface, a more disciplined rewrite in Rust might actually be enough for an insurance company to consider...

CUViper · on July 16, 2018

PDXRust just had a related talk, "Writing Software That's Safe Enough To Drive A Car": https://www.meetup.com/PDXRust/events/252160920/

skolemtotem · on July 16, 2018

Will AV1, or even VP9 for that matter, ever be suitable for realtime encoding, or is that just not their target market?

matt4077 · on July 16, 2018

Yes, of course. Anything else would be DOA.

On a general note: there really seems to be an extremely inaccurate narrative regarding AV1 and speed taking hold. I can't understand why it isn't easier understood that a reference implementation is about accuracy only, completely ignoring performance considerations. Not in the usual "we'll now try to make it faster", but as in "this is never meant to be used in production, and it's performance is in no way indicative of the performance optimised encoders will see".

As but one example: media encoding is pretty close to being "embarrassingly parallel" in principle, making the first three orders of magnitude easy wins for a straightforward GPU implementation.

Ace17 · on July 16, 2018

Video compression engineer here.

> I can't understand why it isn't easier understood that a reference implementation is about accuracy only, > completely ignoring performance considerations.

Because the official codebase conveys another message. Have a look, there are SIMD implementations for almost all supported targets.

https://aomedia.googlesource.com/aom/+/av1-normative/aom_dsp... https://aomedia.googlesource.com/aom/+/av1-normative/aom_dsp... https://aomedia.googlesource.com/aom/+/av1-normative/aom_dsp... ....

What are these files for, if not performance? They've been maintained and kept synchronized with the reference C code during the whole project, long before the codec was frozen (and it was a huge PITA).

This doesn't look like "completely ignoring performance considerations".

> As but one example: media encoding is pretty close to being "embarrassingly parallel" in principle,

Almost all video codecs exploit some block-level encoding context, which means the way you encode one block depends on how the previous neighbooring blocks were encoded. This creates a huge dependency between blocks. There are tools like slicing/tiling that allow you to break these dependencies, and thus, encoding in parallel, but at the cost of video quality. Making the problem "embarrassingly parallel" at this point would make the video "embarrasingly ugly".

You could encode multiple frames in parallel ; but then again, being able to encode them independently means you're basically trashing all the compression context (reference frames), and your video quality goes down the tubes.

In an offline encoding scenario (Netflix, Youtube), if you have lots of memory, you can encode multiple independent video sequences from the same movie. Making the problem "embarrassingly parallel" in this case would require an "embarrassingly huge" amount of memory. Also, it's not applicable to a live scenario (think: latency).

Ienuur4i · on July 16, 2018

> media encoding is pretty close to being "embarrassingly parallel" in principle

My understanding is that there are some fairly tight feedback loops in the encoders that make it difficult to offload things to the GPU, at least if you want to maximize the quality per byte metric. If you want to target realtime and don't need optimal compression it probably gets easier.

Jasper_ · on July 16, 2018

> As but one example: media encoding is pretty close to being "embarrassingly parallel" in principle

Which part? 90% of what you're doing is context or inter-frame dependent. Video encoders that live on graphics cards today use dedicated ASIC hardware.

clouddrover · on July 16, 2018

You can divide the video into chunks and encode the chunks in parallel. This is what Netflix does:

https://medium.com/netflix-techblog/high-quality-video-encod...

https://medium.com/netflix-techblog/dynamic-optimizer-a-perc...

Works well when you're doing video at the scale of Netflix, but not necessarily much help to the individual user who just wants to encode a video.

Jasper_ · on July 16, 2018

> You can divide the video into chunks and encode the chunks in parallel

You can do this with zlib too (zlib divides a file up into 64k chunks). Doesn't mean that zlib is well-suited for GPUs, nor is each chunk "embarrassingly parallel". Neither Netflix post talks about using the GPU at all.

Ace17 · on July 16, 2018

> You can divide the video into chunks and encode the chunks in parallel.

What about live encoding?

clouddrover · on July 16, 2018

You can split the encoding across 32 cores:

https://bitmovin.com/constantly-evolving-video-landscape-dis...

https://bitmovin.com/bitmovin-supports-av1-encoding-vod-live...

pas · on July 16, 2018

People are pragmatic, at least in this regard. They don't really suffer from the bandwidth costs, they want fast encode speeds for offline storage.

And they are simply cautious. They don't really care about the hype x264 is good enough visually, now all visual comparisons are done on ridiculously low bitrate (which is a good thing, but people don't really care).

TD-Linux · on July 16, 2018

I actually had a realtime rav1e demo running, with some modifications. There's nothing inherently less realtime about AV1 than, say H.264.

derf_ · on July 16, 2018

There are a number of features that make AV1 structurally more suited to real-time implementations than its predecessor, VP9.

For example, it does adaptive entropy coding instead of explicitly coding probabilities in the header. That means that you don't need to choose between making multiple passes over the frame (one to count symbol occurrences and one to write the bitstream using R-D optimal probabilities) or encoding with sub-optimal probabilities (which can have an overhead upwards of 5% of the bitrate). libaom has always been based on a multi-pass design, as was libvpx before it, but rav1e only needs a single pass per frame (we may add multiple passes for non-realtime use cases later).

In another example, AV1 has explicit dependencies between frames. VP9 maintained multiple banks of probabilities which could be used as a starting point for a new frame. But any frame was allowed to modify any bank. So if you lost a frame, you had no idea if it modified the bank of probabilities used by the next frame. In AV1, probabilities (and all other inter-frame state) propagate via reference frames. So you're guaranteed that if you have all of your references, you can decode a frame correctly. This is important if you want to make a low-latency interactive application that never shows a broken frame.

Some of its tools also become more effective in low-complexity settings. One of the new loop filters, CDEF, gives somewhere around a 2% bitrate savings using objective metrics when tested with libaom running at its highest complexity (although subjective testing suggests the actual improvement is larger). However, when you turn down the complexity, the improvement from CDEF goes up to close to 8%. I.e., using this filter helps you to take shortcuts elsewhere in the encoder.

The real reason the reference encoder is so slow is that it searches a lot of things. You can always make things run faster by searching less. Take a look at http://obe.tv/about-us/obe-blog/item/54-a-look-at-the-bbc-uh... to see how drastically people are limiting HEVC to make it run in real time today (though if you have to go up to 35 Mbps to do so, one might wonder what the point is).

JustFinishedBSG · on July 16, 2018

Well eventually we'll get AV1 hardware encoding so I'd say in 5 years smartphones will encode AV1 in real time

sp332 · on July 16, 2018

Yes, it's just that everyone is focusing on size/bandwidth optimization for now. Once they nail down the actual format, projects will start work on making it fast.

masklinn · on July 16, 2018

I believe the possibility of "making it fast" is taken in account in the existing design, to avoid designing a format which can't be cheaply optimised & hardware-implemented.

sp332 · on July 16, 2018

Right, the encoder is given a lot more options, which leads to a combinatorial explosion when searching for an optimal encoding. Once the options that actually pay off are identified, encoders will be able to tune their heuristics and narrow down the search space.

masklinn · on July 16, 2018

AFAIK realtime encoding is a design consideration for AV1.

p0nce · on July 16, 2018

Their target market is "negotiating with MPEG for lower royalties"

zakk · on July 16, 2018

> ~5 fps encoding @ 480p

How does this compare with the reference encoder?

masklinn · on July 16, 2018

You'd need to run them on the same machine to make sure you get a proper comparison but https://ffmpeg.zeranoe.com/forum/viewtopic.php?t=5601 has some runs. One of the users downthread ("entac") provides both libaom and libx264 numbers, they get 63fps for libx264 and 0.0924fps for libaom (r9028)

Also this currently does delegates work to libaom.

derf_ · on July 16, 2018

> Also this currently does delegates work to libaom.

Currently just for the transforms and to initialize the probabilities for the entropy coder.

zakk · on July 16, 2018

Many thanks!

tatref · on July 16, 2018

The readme didn't specify the --release flag in the cargo run command, this should make a pretty big difference

steveklabnik · on July 16, 2018

I sent a PR; as they merged, they mentioned their test scripts use it...

sargun · on July 16, 2018

Just curious, what's the memory footprint on the encoder in real life?

Do different video encoders, for the same codec, and input produce different outputs, or is the algorithm specified in a way where it produces the same results for two given inputs, no matter what?

wolf550e · on July 16, 2018

For almost all compression algorithms (both lossless and lossy), only the decompression is specified. A compressor can do whatever it wants as long as it produces a bitstream that a compliant decompressor can decode.

For example, you can make a video enocoder that produces a compliant video stream in which every frame is a keyframe and every macroblock is independently fully encoded, thus reducing AV1 (or H265, etc.) to MJPEG. But if the result is decodable by a compliant decoder, your compressor is compliant. It might even be somewhat useful (e.g. output needs to be zero latency or output is intended to be edited).

tzahola · on July 16, 2018

And this is how people keep making substantial improvements to ancient formats like JPEG or MP3.

E.g. Guetzli from Google: https://github.com/google/guetzli/blob/master/README.md

TD-Linux · on July 16, 2018

Virtually all of the consumed memory is from the reference frames, so it depends on how big your video is. AV1 supports up to 8 reference frames, though if you're memory constrained you could use less.

Different encoders produce different outputs, the algorithm isn't specified.

Pissompons · on July 16, 2018

> Do different video encoders, for the same codec, and input produce different outputs, or is the algorithm specified in a way where it produces the same results for two given inputs, no matter what?

Someone else answered this but I thought I'd elaborate: A good way to think about a codec is as a toolbox. The specification tells you which tools you can use to build a frame (encoder) and which you must support to turn one into pixels (decoder).

Which tools are used in what way makes a huge difference in the output of the encoder, particularly in terms of compression. Have a look at the results for a few H.264 encoders [0]. For the "video conferencing" use-case, the best encoder (x264) uses ~400kbps to produce the same quality of the worst at ~1000kbps.

And like different tools have different costs (a jackhammer needs a generator, a handheld hammer does not), so do the tools in the codec toolbox. Some tools might make the encode slower or might make the decode drain more battery from a mobile device. Others might take a lot of physical space on a piece of silicon, so they're rarely used in hardware.

So different encoders have very different characteristics, not just in terms of output but in terms of power usage, speed and complexity as well.

[0]: http://www.compression.ru/video/codec_comparison/h264_2012/

conradev · on July 16, 2018

Different video encoders certainly produce different outputs. Even the same video encoders produce different outputs. Here is a good paper that explains why that is, and talks about how to make a deterministic video h264 encoder:

http://www.ndsl.kaist.edu/~kyoungsoo/papers/mmsys14.pdf