Hacker News new | comments | show | ask | jobs | submit login
How to translate a large C project to Rust (thesharps.us)
158 points by vmbrasseur on Dec 9, 2016 | hide | past | web | favorite | 7 comments



For the author:

You might consider using SaferCPlusPlus[1] as an intermediate step. Most C code translates easily (and automatably if you're already parsing the C code) into SaferCPlusPlus. This already gives you memory safety, if that is the goal.

Next, you can translate to "rustesque" SaferCPlusPlus. SaferCPlusPlus provides elements (pointer types) that (loosely) correspond to Rust elements [2]. The borrow checker's rule of "mutable reference exclusivity" would have to be additionally self-imposed. As you know, this is the hard (difficult to automate) part, but the nice thing is that even partially completed translations/rustifications would remain fully functional (and memory safe).

The idea is that "rustified" SaferCPlusPlus code should translate to Rust fairly easily. There seems to be some skepticism of the notion of a "rusty" C/C++ subset that could be easily translated to valid Rust. The claim seems to be that "lifetime annotation" is essential to the way Rust achieves memory safety without run-time overhead. But others point out that explicit lifetime annotation is not required to evaluate memory safety [3]. It just allows the programmer to express additional intent/expectations about object lifetimes.

Particularly for an application like CVS, I think translation to SaferCPlusPlus is clearly the more expedient solution. Again, the benefits of (intermediate) translation to SaferCPlusPlus: i) You get memory safety in step 1, and ii) all intermediate steps remain functional and memory safe.

[1] https://github.com/duneroadrunner/SaferCPlusPlus

[2] https://github.com/duneroadrunner/SaferCPlusPlus#safercplusp...

[3] http://stackoverflow.com/questions/31609137/why-are-explicit...


Sorry, but I really have to say that I think adding a waypoint through yet another language can't be a good thing. You're probably much better served by using the time saved from the second rewrite on the actual rewrite (C to Rust) you want to do, giving yourself more time to make it right.


> But others point out that explicit lifetime annotation is not required to evaluate memory safety [3].

I think you've extrapolated a bit too much from that answer: the code still has to be semantically lifetime-correct, even if the annotations don't need to exist syntactically. Code that wasn't written with Rust-style lifetimes in mind (even if they're not literally written in the code) is unlikely to automatically satisfy the rules statically, so unless SaferCPlusPlus is using a custom compiler that runs Rust-like rules on the code and flags violations, I would expect a lot of such code is only dynamically correct.

> [2] https://github.com/duneroadrunner/SaferCPlusPlus#safercplusp....

A few points:

- what do you mean by rebindable and non-rebindable references? That is not terminology Rust uses at all, and so it isn't clear what this is referring to.

- The restriction on multiple aliasing for mutable references is not just to avoid objects being deallocated, it avoids things like iterator invalidation and concurrency problems like data races.


> even if they're not literally written in the code

Yeah, that was my point. While you would still need to follow Rust's lifetime rules, they wouldn't need to be explicitly expressed in the code. This comes from a discussion I was having with some Rust people about extracting Rust's static verifier/"borrow checker" so that it could be applied to C/C++ code (or at least SaferCPlusPlus code). Some of them seemed to claim that this was not really possible do to the lack of lifetimes annotation in C/C++. But if all that's required is implicit lifetime specification rather than explicit lifetime annotation, then I don't see why the static verifier couldn't be applied to C/C++ code. The author of the OP suggested (somewhere in his blog) that you could effectively use Rust's static verifier on C code by using his tool to translate the code into Rust and see if it compiles. Which would be essentially equivalent to repurposing Rust's static verifier to work on C code. So I was just trying to express my agreement with the plausibility of this notion, where others might be skeptical.

> I would expect a lot of such code is only dynamically correct.

Yes, my view is that it would be better if the "brains" of the static verifier were instead in the optimizer. That is, rather than refusing to compile code that cannot be statically verified to be safe, you could require that code have run-time checks to ensure memory safety (either by using a library like SaferCPlusPlus, or by having the compiler inject them automatically). If the optimizer can recognize that the code is intrinsically safe (in the same way the borrow checker does), then it could simply strip out the unnecessary run-time checks.

If the optimizer reported which run-time checks it was not able to discard, then the functionality of the optimizer would be a superset of the functionality of the static verifier. That is, if you self-impose the requirement that all run-time checks be optimized out, then you'd have the exact same functionality as the static verifier.

The benefit to this approach is i) sometimes you don't care if the code is statically or dynamically correct (or you know that it's correct and you don't care whether the compiler recognizes it or not), and you don't want to bother having to contort your code to appease the borrow checker. But perhaps more importantly, ii) the "(hopefully optimized out) run-time checks" approach scales better to higher (application) level notions of "correctness", not just memory safety. Right?

And perhaps less importantly, iii) as the compiler (optimizer) gets smarter (i.e. is able to discard more run-time checks), existing code benefits (by getting slightly faster).

> what do you mean by rebindable and non-rebindable references?

I mean references that can be reassigned to point to another variable. I thought the term "mutable reference" might be unclear as to whether reference itself was reassignable or that it facilitated mutation of the thing being referenced. I got the "bind" term from some Rust document that was trying to clarify this "mutable reference" ambiguity. The term used throughout the SaferCPlusPlus documentation is "retargetable". Would that be more clear? I'm open to suggestions.

> The restriction on multiple aliasing for mutable references is not just to avoid objects being deallocated, it avoids things like iterator invalidation and concurrency problems like data races.

Of course you're right, but I don't think it affects the clarity of the point being made. I'm open to any suggestions for a better rephrasing.


This seems like a cool idea! Are there tools to automate the translation from C to SaferCPlusPlus?


Not yet, unfortunately. It might be a while before I get a chance to address it, so if anyone out there is looking for a project, it should be fairly straightforward. At least compared to tools like the subject of this post that engage in some non-trivial semantic reasoning. :)


This is a really cool project! Still very early stages though; in the article he says he's "only" translated 6% of CVS to rust.

I wonder if corrode is already good enough for translating very low level numerical, i.e. imaging processing code...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: