
C2Rust: translate C into Rust code - blopeur
https://c2rust.com/
======
Animats
This is compiling C into very unsafe Rust as a target language:

    
    
        pub unsafe extern "C" fn insertion_sort(n: libc::c_int, p: *mut libc::c_int)
     -> () {
            let mut i: libc::c_int = 1i32;
            while i < n {
                let tmp: libc::c_int = *p.offset(i as isize);
                let mut j: libc::c_int = i;
                while j > 0i32 && *p.offset((j - 1i32) as isize) > tmp {
                    *p.offset(j as isize) = *p.offset((j - 1i32) as isize);
                    j -= 1
                }
                *p.offset(j as isize) = tmp;
                i += 1
            };
        }
    

The output is unmaintainable, like the output from a compiler. This isn't
translation into usable Rust.

A C++ to Rust translator would be a good thing, but it has to be much smarter
than this. This doesn't even comprehend arrays in C.

A good translator might need some human help to deal with C's ambiguities.
Like "Is this an array? (Y or N)" How big is it? (Enter expression). Then you
get a program that's real Rust. If the user is wrong about the size, safe Rust
will either allocate too much space or get a subscript error.

~~~
dbaupp
I know and agree that unsafe Rust is problematic, but that criticism of this
is missing the forest for the trees. The original code is just as unsafe.

Refactoring from unsafe Rust into safe Rust manually is likely to be easier
than directly from C into Rust: one is only trying to deal with one small
delta (like changing length/pointer arguments into a slice) rather than also
having to deal with syntax changes and tooling issues.

As others have said, it is essentially impossible to automatically translate
to safe Rust without non-trivial annotation effort, since C the language and C
in practice is very different to how Rust guarantees safety. It seems to me
that that "annotation" effort is probably better spent going unsafe to safe
Rust (where, subjectively, the language is nicer to use, like better enums and
less error prone control flow) rather than annotating the original C.

~~~
amelius
> The original code is just as unsafe.

But the translated code being unreadable to humans adds a whole new layer of
"unsafe".

~~~
dbaupp
While the translated code is harder to read, it's much easier to refactor into
compiler-checked safe code than the original C. I think doing the translation
as part of a larger migration from C to _safe_ Rust is essentially the only
defensible reason to do it, as it doesn't seem like other reasons can justify
the risk of (as one of the sibling comments points out) bugs in the translator
and the code being unreadable, aka, in an "unnatural" representation of trying
to express C idioms in Rust.

That is to say, I don't think humans should be reading or maintaining this
code for more than it takes to refactor to be safe(r). (Whether this is how
this sort of tool is used in practice is a different question, and one that
definitely needs to be considered.)

------
devit
I think the next step would be to have a tool that can convert C into _safe_
Rust with a combination of static analysis and framework/program-specific
user-written rules to translate specific C framework constructs into Rust
equivalents.

An eventual goal could be for instance to automatically translate the Linux
kernel with the aid of a lot of custom rules to handle its constructs.

~~~
hyperpape
There's a real sense in which, if this were true, we might not need Rust. If
we could mechanically translate the Linux kernel to safe Rust, we could prove
the Linux kernel safe. If we could prove the Linux kernel safe, that would be
a strong argument against a need to translate it to Rust.

Note that this point is independent of the question of whether rewriting the
Linux kernel in Rust is actually good/feasible idea. I also think that Rust
has other advantages over C that aren't just safety, so it's not a complete
comparison--safety is just the biggest one.

~~~
bluejekyll
The major difference would be that future versions would be in safe Rust as
well. All code written after that point would be safe.

What you say is true for a single point in time, not for the long-term future.

But, I would say this is probably infeasible, so theorizing too much about it
seems a little wasteful. If the Linux maintainers, Linus et al, suddennly
decided to convert to Rust, it would probably done incrementally, module by
module. But it’s unlikely to happen given past statements.

~~~
hyperpape
If we had a standard for safe C that the linux kernel could feasibly meet,
then it could be a condition of future changes that it continue to be safe.

As you say, it's pretty hypothetical on both fronts--we're not gonna be able
to prove that about a C project like the kernel, and they're not gonna rewrite
in Rust any time soon.

~~~
bluejekyll
As I see it the main benefits of C over using other languages like Rust, are
generally the ease of getting access to raw memory, sharing pointers, and
direct access to hardware. Rust is actually good at all of that too, but is
just as unsafe.

The point I’m making is that the mythical “safe C”, would have to remove many
of the benefits of why people enjoy C, so why not just use Rust at that point?

One variation I’ve seen is the idea of “safer C” put forward by DJB, which
would remove all undefined behavior as it’s main goal.

In any case, people would need to learn something new, and that seems to be a
huge barrier for any language.

~~~
vasili111
What DJB stands for?

~~~
bluejekyll
Sorry, Daniel J Bernstein, a pretty famous (if you follow the space)
cryptologist and software engineer.

------
Animats
If you try making p a local array of length 10, ordinary Rust subscripting
comes out. So the translator knows what to do with arrays; it just makes very
C-ish assumptions about them. It just needs to know when a pointer is really
an array to treat it as an array.

This has potential. If you had some way to annotate or advise the translator,
it could do a much better job, and turn out safe, usable Rust.

Some of that can be done automatically by looking at calls to functions. If
it's always an array going in, the formal parameter can become an array. Any C
that doesn't have explicit pointer arithmetic should be translated into Rust
that doesn't have explicit pointer arithmetic.

------
m0meni
How’s this compare to
[https://github.com/jameysharp/corrode](https://github.com/jameysharp/corrode)

~~~
glguy
(I'm one of the authors of this tool)

Our original plan was to simply work to improve Corrode. We eventually decided
to to implement a new tool that uses Clang as the frontend in order to get
more reliable parsing, preprocessing, and type checking of the input C code.
The result is that we are able to support more code and C extensions than we
thought we'd be able to building on top of the good work of the Corrode
project.

~~~
phkahler
Is there any hope of going after even a subset of C++? I know the difference
is enormous, I'm just asking the question.

~~~
glguy
We've only started pondering what subset of C++ we might be able to support in
future work for the project. Currently only C is in scope. That's an obvious
next goal but it seems quite a bit more daunting!

------
craftyguy
Missed opportunity to name it 'crust'

~~~
kps
‘C-water’, because it makes things rust.

~~~
projektfu
Or maybe c-salt

------
grok2
The insertion sort conversion example makes Rust look intimidating and verbose
as compared to C :-).

~~~
glguy
Fortunately, once you start refactoring the Rust to take advantage of the
functionality that doesn't exist in C (like slices, iterators, etc), things
start to look much cleaner.

Translating the code into ugly, unsafe Rust is intended to only be the first
step. We're working on tools to help with that refactoring process, too.

~~~
taneq
Sounds like a case of "writing C in Rust" compared with "writing Rust". You're
not really writing in a language until you're thinking in its idioms.

~~~
masklinn
> Sounds like a case of "writing C in Rust" compared with "writing Rust".

Duh? It's literally taking C code and generating Rust code which behaves the
same.

> You're not really writing in a language until you're thinking in its idioms.

The entire point is to get your foot in the door.

This gives you a pile of Rust code which (bugs aside) should behave the exact
same way C code does.

From there on you're living in the Rust world and can improve that as you see
fit.

If you can afford doing the initial transition in one short, later
improvements are much simpler than having to maintain an internal (moving)
front between remaining C code and new Rust code as e.g. librsvg does.

~~~
lomnakkus
> If you can afford doing the initial transition in one short, later
> improvements are much simpler than having to maintain an internal (moving)
> front between remaining C code and new Rust code as e.g. librsvg does.

That's all well and good... _if_ you don't have to maintain the project while
doing the necessary C-as-Rust -> "proper" Rust conversion. This C-as-Rust
output seems[1] like it woul d be incredibly difficult to do bugfixes in and
there's also the issue of what happens if the C code happened to rely on
implementation-defined behavior (or even UB which the implementation happens
to "do the right thing" with).

Unless you have a trivial amount of code, I'd say the right thing is to do the
conversion in small chunks. This also gives you a much better way to ensure
(through test suites) that you're not introducing excessively many new bugs
when rewriting C-as-Rust to "proper" Rust.

[1] I realize this is early days, but I'm talking about the present time.

------
csomar
So the demo does actually translate C into unsafe Rust. I'm guessing this
translator is, thus, unaware of Rust borrowing/ownership system?

~~~
masklinn
> So the demo does actually translate C into unsafe Rust.

It's semantics-preserving, and C constructs are not expected to match safe
Rust, so that makes sense, Corrode does more or less the same.

Citrus generates "safe rust", it also doesn't generate working code.

~~~
sitkack
The compilation target could be a Rust slab [0], much like Emscripten uses
Typed Arrays [1]. In doing so, it would retain much of the safety properties
of Rust, but it pushes the problem into maintaining "the heap" for the C code.

[0] [https://github.com/carllerche/slab](https://github.com/carllerche/slab)

[1] [https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Type...](https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Typed_arrays)

~~~
masklinn
That would be significantly less useful if the intent is to move the code from
C to Rust.

~~~
sitkack
How so?

It allows unsafe C to be incorporated into safe Rust while maintaining the
same level of safety.

A similar less ergonomic solution would be to load C via WASM modules.

I'd take slab or wasm over whole lib translation into unsafe Rust.

------
mankash666
What's the use case for this? Are native C binaries/libraries not invokable in
rust?

~~~
civilitty
Yes, you can call any C using the Rust FFI but then you need to create a safe
wrapper around the unsafe Rust calls to C. This tool looks like it's for
generating a first pass at porting existing C code to Rust. Once you have that
C-to-Rust conversion, it becomes much easier because you can incrementally
port select types instead of incrementally porting the API and dealing with
the impedance mismatch between the Rust api and C api (you can work on
function arguments instead of function calls).

------
sometimesijust
This either does not work well OR there is no need for rust to exist. Is there
a pressing need for rust code that is as unsafe as c?

~~~
oconnor663
The goal is to assist the human in translating C code to safe Rust code.
Getting to unsafe Rust code as an automated first step saves a _lot_ of work.

------
brian_herman

        #define a "xxxxxxxxxxx"
        #define b a a a a a a a
        #define c b b b b b b b
        #define d c c c c c c c
        #define e d d d d d d d
        #define f e e e e e e e
        #define g f f f f f f f
        #define h g g g g g g g
        #define i h h h h h h h
        #define j i i i i i i i
        main(){char*z=j;}
    

This fails. The error is entity not found.

~~~
glguy
There are definitely ways to overwhelm the translator web demo. I know I'm not
clever enough to block all of them. I put the page up to give people a way to
try out the translator without having to build it. Please don't kill it :-)

~~~
brian_herman
Oh sorry

