
C2rust vs. Corrode - JoshTriplett
https://jamey.thesharps.us/2018/06/30/c2rust-vs-corrode/
======
harpocrates
I'm a primary contributor to c2rust and I may be the person "stolen" away from
corrode. I'd like to apologize if it feels like we ripped off ideas without
giving due credit - the project wasn't really supposed to "discovered" so
soon. The website was a throwaway idea, a means of easily sharing our work in
a limited circle while avoiding both the DARPA approval and the sub-par build
process (you have to build a huge chunk of Clang).

So here is me acknowledging Jamey's work: I personally did take inspiration
from Corrode, and I was expecting to work on Corrode proper when I joined
Galois. I've re-read the CFG module of Corrode several times (as well as the
Mozilla paper, and some older literature).

All that said, I also want to point out that Corrode hasn't had any activity
at all since last April - and that's not for want of PRs piling up. I'm not
criticizing here, since I understand that managing an open source can be quite
time-consuming and stressful, but I feel like this also does need to be
mentioned. Also, c2rust can be freely forked. Once the DARPA funding runs out,
it is my hope that the Rust community will become the maintainers.

Finally, regarding the many improvements that can be automated: that is next
up on our plate!

------
daniel_rh
I'm a big corrode fan and I used it (plus a really simple python script
[https://github.com/dropbox/rust-
brotli/blob/master/uncorrode...](https://github.com/dropbox/rust-
brotli/blob/master/uncorrode.py) ) to help translate the brotli encoder
[https://github.com/google/brotli/tree/master/c/enc](https://github.com/google/brotli/tree/master/c/enc)
to safe, almost-idiomatic rust.

My initial attempts used top of tree corrode, but I found that the relooper
got in the way of writing idiomatic rust, so I restarted my translation
efforts and actually pinned corrode to before the relooper and backported a
lot of the improvements that happened since the relooping effort
[https://github.com/danielrh/corrode/tree/unprocessed_loops](https://github.com/danielrh/corrode/tree/unprocessed_loops)

On one hand I'm very glad corrode supported the relooper: it got me excited
about the capabilities of corrode, but in the end I think any serious port of
a project is going to need to think about idiomatic control flow and will
require a minor rewrite of the C code. The improvements to the C brotli
encoder happened here
[https://github.com/danielrh/brotli/commits/corrode](https://github.com/danielrh/brotli/commits/corrode)
to remove the gotos and some negative array access.

All in all, the process to get the brotli encoder into rust was very smooth.
Much of it may be due to the google engineers being extremely careful about
memory ownership and design.

Anyhow I'm excited about c2rust, but am nervous that corrode may be de-
emphasized since I know it worked well for me.

If you want to know more about the experience of porting a large existing
project to rust, I posted about the rust brotli decoder here
[https://blogs.dropbox.com/tech/2016/06/lossless-
compression-...](https://blogs.dropbox.com/tech/2016/06/lossless-compression-
with-brotli/) (subsection "The Port")

~~~
harpocrates
FWIW we've spent a lot of time trying to re-engineer the control-flow
translation to be heuristic-friendly in c2rust. We've extended and tweaked the
Relooper algorithm in hopes of preserving more of the initial control flow. It
still isn't great, but it is getting better.

Offhand, here are some sizeable additions:

* Keep track of some extra information from the C source about what basic blocks were in loops or branch points, then use than information to try to extract back out similar looking Rust

* Support translating `switch` to `match`, complete with collapsing some patterns together

* Properly handle initialization

That said, c2rust can be invoked _without_ relooper enabled if you so wish. In
that case, it will simply refuse to translate code with goto's.

~~~
voidmain
I have no idea what I'm talking about, but if you have a control flow
translation pass that is intended to translate constructs like goto and
produces "uglier" code, why not automatically disable it for functions (or
whatever granularity works) with no such constructs?

------
munin
It's good to set the record straight on the relationship between c2rust and
corrode. I was wondering that myself. Galois should give some credit back -
that's free for them. It sounds like an accident not to.

I'm sympathetic to the terms of a non-fundamental research program funded by
DARPA. Easy street with that agency is to get designated as doing fundamental
research. If you aren't at a university, that is an uphill battle that most
prime contractors won't be willing to deal with.

However, the pre-pub regime is not impossible to live under. You'd have to do
something like: have a private repo/branch where you do the work that's funded
by DARPA, rebase constantly against the public branch, and send once a month
public release request with a diff to merge into the public one. Probably
after the first few months, the PM will come up with a way to streamline the
process.

You lose some energy to friction, but your other option is to just not take
the money at all, so it's a question of do you spend 80% of your time working
on this project and 20% of your time dealing with administrative bullshit, or
0% of your time working on this project.

It's a choice that's given to many.

~~~
JoshTriplett
> send once a month public release request with a diff to merge into the
> public one

That makes it incredibly difficult to collaborate with others.

~~~
munin
I didn't say it wouldn't be difficult, I said it would not be impossible. How
badly do you want the money? How badly do you want to do the project? If the
answer is really badly, then, from personal experience, you will put up with
some friction. If the answer is not that badly, well, then, how angry can you
be when you said no? In my own career I've made decisions that go either way
at different times.

My ideal funding model is to have someone slide a manilla envelope full of
cash under the door every month, and never speak to me or compel me to do
anything. There are not that many opportunities like that out there. If you
can't find one of those, you're going to compromise.

------
rattray
I feel for the author of this post. It sounds like a sad and frustrating
position to be in.

At the same time, my respect for Galois and DARPA (and other contributors to
C2Rust) mostly only went up from reading it. It's unfortunate that corrode
wasn't acknowledged, but I feel fairly confident that was either for a real
reason (like some government contract thing) or forgetfulness.

It seems they acted reasonably and well, and did quality work. Their offer to
fund a few tech talks seemed generous (and mutually beneficial); I wouldn't be
surprise if they'd have paid for even a less organized talk, out of
gratitude/respect if nothing else.

In any case, I hope Jamey is able to feel better about this and find some
great work soon; seems like a great person to work with all around.

------
nurettin
While the author explains why the translation doesn't work on all code, he
mentions:

>> certain “features” of C that Rust didn’t provide.

Which then links to a markdown document that lists, among other things:

>> bitfields

I've used bitfields to represent demuxers in memory constrained embedded
programming and as far as I know, the devices still work today, so I am
confused as to why the sarcastic quotes there.

~~~
Sean1708
> With Mozilla’s support, I proved that the only real limits were around
> translating certain “features” of C that Rust didn’t provide.

Sounds more like he's talking about the "Likely Won't Ever Support"[0]
section.

[0]: [https://github.com/immunant/c2rust/blob/master/docs/known-
li...](https://github.com/immunant/c2rust/blob/master/docs/known-
limitations.md#likely-wont-ever-support)

------
ychen306
I took a cursory look at the output of C2rust, and it seems to only output
unsafe code. What's the point of translating C to (unsafe) Rust code?

~~~
harpocrates
The idea is that this is a first step towards safe Rust. First, you convert to
unsafe (but semantically preserving) Rust, then you refactor. The refactor
stage probably will involve changing some semantics (read: fixing bugs), or
perhaps proving some properties with an SMT solver before applying certain
transformations (converting a `libc::c_int` to an `i32`, or a `*const i32` to
a `&i32`).

~~~
Animats
C2Rust basically compiles C into a very low level program in Rust. You've then
lost the C idioms. Now you have to decompile the low-level rust with pointer
arithmetic into Rust abstractions. That's very hard, probably harder than
converting C idioms to Rust idioms, checking to see if the result will be
equivalent, and falling back to low-level compilation only when absolutely
necessary.

The key to this is figuring out the comparable representation for data. Mostly
this is a problem with arrays, since C's array/pointer system lacks size info.
All C arrays have a size; it's just that the language doesn't know it. The
trick is to figure out how the program is representing the size info.
Somewhere, there was probably a "malloc" which set the size, and you may have
to track backwards to find it. Then you can replace the C array with a Rust
array that carries size information, and maybe eliminate variables which carry
now-redundant size info.

That would produce readable Rust. But it requires whole-program analysis.
That's OK, that's what gigabytes of RAM are for.

~~~
dbaupp
I suspect it's not possible for most interesting programs, even with whole
program analysis. As soon as you start storing pointers behind other pointers,
it's (very) hard to keep track of where they came from.

There's more discussion in the replies to
[https://news.ycombinator.com/item?id=17382464](https://news.ycombinator.com/item?id=17382464)
that you may've missed.

~~~
Animats
Yes, I know; I started that discussion.

~~~
dbaupp
I see, I'm sorry. It just seemed like you might not have noticed additional
replies to your comment, since you didn't reply there and essentially didn't
change what you said.

------
eb00
Really sickens me to read that they didn't acknowledge the prior work. It
would break my heart to be in this position, but then to not even get credit
is just too much.

~~~
dataking
[https://github.com/immunant/c2rust#acknowledgements-and-
lice...](https://github.com/immunant/c2rust#acknowledgements-and-licensing)

~~~
Liquid_Fire
Note that the above acknowledgement was only just added[0], likely in response
to this blog post.

[0]
[https://github.com/immunant/c2rust/commit/e0d3adf656db000b1c...](https://github.com/immunant/c2rust/commit/e0d3adf656db000b1c845e775c311e0e724d4fe4)

------
nickpsecurity
If the concern is leaks, defense contractors like Galois might be able to get
DARPA to change that policy. My idea would be a split between stuff that will
definitely be FOSS and stuff that might be sensitive. Like Compartmented Mode
Workstations or with Qubes, they could even isolate them in VM’s whose border
color, labels, and firewall policies reflect the difference. The sensitive one
would be used in a new, custom project or derivative that pulls in the open
one. Whereas, the security policy wouldn’t led sensitive stuff interact with
the Internet at all.

Quite a few products on the market for this on top of free stuff like Qubes or
Muen. One can also use multiple boxes with KVM switch if worried about breaks.
Anyone with DARPA or Defense experience think this proposal has a good chance
of working?

~~~
munin
> If the concern is leaks, defense contractors like Galois might be able to
> get DARPA to change that policy.

I don't think any one thing at either the DARPA or Galois level can change
this policy. It's a question of whether or not you are doing fundamental
research, i.e. are you a 6.1, 6.2 or 6.3 program. If you want to do
fundamental research on an applied research contract, you're going to have a
bad time.

> Anyone with DARPA or Defense experience think this proposal has a good
> chance of working?

Frankly, no. This is a technical solution to a problem that is much deeper.

How does having a compartmented mode workstation help with this problem? Let's
say that there's the main fork A and the private fork B. Someone comes up with
a sensitive change to B that is, I dunno, about 50 lines. Someone else working
on both A and B sees this change to B and they think oh hey, I could do
something similar in A, so they write, from scratch, a similar change and
commit it to A. They do this without having any file cross any isolation
boundary, they just type stuff with their fingers.

------
yazr
Are there similar projects which markup C to safer-C ?

I remember seeing something similar but i cant find it. Somehow, it analyzes C
code and produced C code with safe pointers, etc.

Can someone please help ? Thanks

(Not ASAN or binary-instrumentation projects)

~~~
duneroadrunner
Not sure if it's what you're thinking of, but there is the "C to
SaferCPlusPlus" auto-translation helper tool[1]. The idea is that the output
can be compiled as straight C, with the safety mechanisms disabled, or with
the (compile-time and run-time) safety mechanisms enabled, which requires
compilation as C++. The tool is currently being neglected, and at the moment
it mostly just translates native arrays, and pointers that are being used as
array iterators.

I say "just", but actually it addresses what I believe to be (by far) the most
difficult issue, which, as other commenters have mentioned, is determining in
the general case whether a pointer is actually being used as an array
iterator. And it also determines whether that array is a fixed-size array or a
(potentially) resizable array.

For example, in this code snippet:

    
    
        void foo1(int *p, int n) {
            foo2(p, n);
        }
    

does _p_ point into an array, or just at an _int_? You have to deduce it from
context. And there's no theoretical limit to the amount of deduction you might
need to do.

The tool doesn't automatically translate regular "non-array" pointers yet, but
that can be a straightforward task as the SaferCPlusPlus library has a safe,
general drop-in replacement for pointers. Translating to "performance optimal"
safe pointers is another story. There you have challenges similar to
translating to safe Rust I think.

[1] shameless plug: [https://github.com/duneroadrunner/SaferCPlusPlus-
AutoTransla...](https://github.com/duneroadrunner/SaferCPlusPlus-
AutoTranslation)

------
BooneJS
> I knew if I spent a year working on this project and then found out that it
> was going to be kept in-house forever, I would feel that my project and my
> effort had been stolen from me, no matter how much I got paid in the
> meantime.

I don’t know how to parse this. Is the author talking about corrode 2.0 being
stolen? Or that someone would pay him to write code that can’t be released or
discussed in public?

~~~
hawkice
My read is, if the goal is a super valuable open source project, developing it
and having it stuck in-house sucks. And what can you do? Reimplement it from
scratch open source? Sounds like a massive pain, and if the open source tool
is an important goal, the closed source project doesn't get you anywhere
except frustrated.

------
wyldfire
I'm trying to read between the lines here: is c2rust a libclang wrapper and
corrode is an original grammar based or regex based parser?

~~~
mastax
Corrode uses the Haskell library
[http://visq.github.io/language-c/](http://visq.github.io/language-c/) which
doesn't appear to be a libclang wrapper.

