Great work by the authors -- great to see this type of effort detailed. Video codecs are complicated so great to see that somebody did this. It would be cool if there ends up being some presentation about this saying something like "here are the bad patterns you should avoid if you want to be able to port your C code to Rust at some point" (things that may be outside of what you actually had to do on this project as well as what you mentioned in the post). Also I'm glad you mentioned how long this took -- nice to get a scale of how hard these ports are.
I don't think we do either of those errors. Obviously if the C caller does weird stuff, all bets are off, but we don't mutate through `&T`s or not initialize memory. `MaybeUninit` is only used in a few isolated and carefully checked places. Most of the rest of the buffers are zero initialized, which is usually done for free by the kernel.
I'm thinking more about the differences between how C and Rust uses noalias, but I can't find an example of it going wrong in real translated code currently.
Rust projects are pretty easy to deploy. It's just LLVM underneath, and the product is similar to clang-built code. You get a library, static or dynamic, that you can link with anything that can link with C.
Rust projects are much easier to build, especially when supporting multiple platforms. I've converted projects to Rust to make them easier to build and deploy.
Does Cargo re-use dependencies today? Last time I tried to build medium-sized Rust projects, it pulled hundreds of dependencies each time, even the same ones. It took up too much space, and took too long to build.
That would only work if the C code was treated as a generated artifact and not touched directly. If the C code is worked on directly it will be just as susceptible to unsafe changes as before.
Comparably easy to deploy perhaps but easier? Is there some scenario you have in mind where that would be worth the overhead of that complex extra step?
Codecs, like browsers, are handling malformed and potentially hostile data non-stop. And like browsers, they tend to be widely deployed. Further, like cryptographic software, they do a lot of parsing and complex math on their inputs. Writing safe parsers is notoriously difficult, and ideally suited to memory safe languages and parser-generators. All this adds up to a high likelihood of idiosyncratic bugs, which are attractive to exploit.
This is why Apple disables most codecs in iMessage in lockdown mode.
"...can Rust achieve the memory-safety promise? This paper studies the question by surveying 186 real-world bug reports collected from several origins which contain all existing Rust CVEs (common vulnerability and exposures) of memory-safety issues by 2020-12-31. We manually analyze each bug and extract their culprit patterns. Our analysis result shows that Rust can keep its promise that all memory-safety bugs require unsafe code..."
You can't change the environment codecs are expected to run in, or the data they are used to process, but you can use effective tools to limit the sorts of bugs you are likely to run into, and isolate them to the relatively small area between unsafe declarations. Exploitable C style memory errors simply aren't possible in memory safe languages like Python, for instance, but Python leaves performance on the table which is valuable for codecs. Rust provides a high degree of memory safety while performing as well as C or C++.
Rust addresses opaqueness and untestedness directly, by enforcing the language constraints at compile time. Rust addresses the privileged nature of the code and it's exposure directly by disallowing the most common bugs used to gain access to that privileged execution. Obscurity and diversity can both benefit from Rust's integration of Cargo and the resultant ease with which it allows for sharing common code.
In short, expressing this code in an appropriate language can absolutely address and even go some way toward correcting each of those issues.