
How to translate a large C project to Rust - vmbrasseur
http://jamey.thesharps.us/2016/12/how-to-translate-large-c-project-to-rust.html
======
duneroadrunner
For the author:

You might consider using SaferCPlusPlus[1] as an intermediate step. Most C
code translates easily (and automatably if you're already parsing the C code)
into SaferCPlusPlus. This already gives you memory safety, if that is the
goal.

Next, you can translate to "rustesque" SaferCPlusPlus. SaferCPlusPlus provides
elements (pointer types) that (loosely) correspond to Rust elements [2]. The
borrow checker's rule of "mutable reference exclusivity" would have to be
additionally self-imposed. As you know, this is the hard (difficult to
automate) part, but the nice thing is that even partially completed
translations/rustifications would remain fully functional (and memory safe).

The idea is that "rustified" SaferCPlusPlus code should translate to Rust
fairly easily. There seems to be some skepticism of the notion of a "rusty"
C/C++ subset that could be easily translated to valid Rust. The claim seems to
be that "lifetime annotation" is essential to the way Rust achieves memory
safety without run-time overhead. But others point out that explicit lifetime
annotation is not required to evaluate memory safety [3]. It just allows the
programmer to express additional intent/expectations about object lifetimes.

Particularly for an application like CVS, I think translation to
SaferCPlusPlus is clearly the more expedient solution. Again, the benefits of
(intermediate) translation to SaferCPlusPlus: i) You get memory safety in step
1, and ii) all intermediate steps remain functional and memory safe.

[1]
[https://github.com/duneroadrunner/SaferCPlusPlus](https://github.com/duneroadrunner/SaferCPlusPlus)

[2]
[https://github.com/duneroadrunner/SaferCPlusPlus#safercplusp...](https://github.com/duneroadrunner/SaferCPlusPlus#safercplusplus-
versus-rust)

[3] [http://stackoverflow.com/questions/31609137/why-are-
explicit...](http://stackoverflow.com/questions/31609137/why-are-explicit-
lifetimes-needed-in-rust#31612025)

~~~
dbaupp
_> But others point out that explicit lifetime annotation is not required to
evaluate memory safety [3]._

I think you've extrapolated a bit too much from that answer: the code still
has to be semantically lifetime-correct, even if the annotations don't need to
exist syntactically. Code that wasn't written with Rust-style lifetimes in
mind (even if they're not literally written in the code) is unlikely to
automatically satisfy the rules statically, so unless SaferCPlusPlus is using
a custom compiler that runs Rust-like rules on the code and flags violations,
I would expect a lot of such code is only dynamically correct.

 _> [2]
[https://github.com/duneroadrunner/SaferCPlusPlus#safercplusp...](https://github.com/duneroadrunner/SaferCPlusPlus#safercplusp..).
_

A few points:

\- what do you mean by rebindable and non-rebindable references? That is not
terminology Rust uses at all, and so it isn't clear what this is referring to.

\- The restriction on multiple aliasing for mutable references is not just to
avoid objects being deallocated, it avoids things like iterator invalidation
and concurrency problems like data races.

~~~
duneroadrunner
> even if they're not literally written in the code

Yeah, that was my point. While you would still need to follow Rust's lifetime
rules, they wouldn't need to be explicitly expressed in the code. This comes
from a discussion I was having with some Rust people about extracting Rust's
static verifier/"borrow checker" so that it could be applied to C/C++ code (or
at least SaferCPlusPlus code). Some of them seemed to claim that this was not
really possible do to the lack of lifetimes annotation in C/C++. But if all
that's required is implicit lifetime specification rather than explicit
lifetime annotation, then I don't see why the static verifier couldn't be
applied to C/C++ code. The author of the OP suggested (somewhere in his blog)
that you could effectively use Rust's static verifier on C code by using his
tool to translate the code into Rust and see if it compiles. Which would be
essentially equivalent to repurposing Rust's static verifier to work on C
code. So I was just trying to express my agreement with the plausibility of
this notion, where others might be skeptical.

> I would expect a lot of such code is only dynamically correct.

Yes, my view is that it would be better if the "brains" of the static verifier
were instead in the optimizer. That is, rather than refusing to compile code
that cannot be statically verified to be safe, you could require that code
have run-time checks to ensure memory safety (either by using a library like
SaferCPlusPlus, or by having the compiler inject them automatically). If the
optimizer can recognize that the code is intrinsically safe (in the same way
the borrow checker does), then it could simply strip out the unnecessary run-
time checks.

If the optimizer reported which run-time checks it was not able to discard,
then the functionality of the optimizer would be a superset of the
functionality of the static verifier. That is, if you self-impose the
requirement that all run-time checks be optimized out, then you'd have the
exact same functionality as the static verifier.

The benefit to this approach is i) sometimes you don't care if the code is
statically or dynamically correct (or you know that it's correct and you don't
care whether the compiler recognizes it or not), and you don't want to bother
having to contort your code to appease the borrow checker. But perhaps more
importantly, ii) the "(hopefully optimized out) run-time checks" approach
scales better to higher (application) level notions of "correctness", not just
memory safety. Right?

And perhaps less importantly, iii) as the compiler (optimizer) gets smarter
(i.e. is able to discard more run-time checks), existing code benefits (by
getting slightly faster).

> what do you mean by rebindable and non-rebindable references?

I mean references that can be reassigned to point to another variable. I
thought the term "mutable reference" might be unclear as to whether reference
itself was reassignable or that it facilitated mutation of the thing being
referenced. I got the "bind" term from some Rust document that was trying to
clarify this "mutable reference" ambiguity. The term used throughout the
SaferCPlusPlus documentation is "retargetable". Would that be more clear? I'm
open to suggestions.

> The restriction on multiple aliasing for mutable references is not just to
> avoid objects being deallocated, it avoids things like iterator invalidation
> and concurrency problems like data races.

Of course you're right, but I don't think it affects the clarity of the point
being made. I'm open to any suggestions for a better rephrasing.

------
drewm1980
This is a really cool project! Still very early stages though; in the article
he says he's "only" translated 6% of CVS to rust.

I wonder if corrode is already good enough for translating very low level
numerical, i.e. imaging processing code...

