One highlight of this post is their work exposing a memory safety error in the original source code:
> The Rust compiler also flagged a similar but actually buggy example in G_TryPushingEntity where the conditional is >, not >=. The out of bounds pointer was then dereferenced after the conditional, which is an actual memory safety bug.
Interestingly I've too found bugs in original source similar to this when rewriting in just TypeScript, so I suspect it's just someone taking a look that allows the bug to be discovered.
Author here: I wanted to let everyone know we made some small edits to the text, and added two new branches to our repository: transpiled containing the raw transpiler output [1], and refactored containing the same code [2] after a few refactoring commands.
The closest Rusty feature to C++ templates is most likely macros. I don't think even const generics, specialization etc. (which are WIP anyway at the moment) would be enough to replicate templates in the fully general case.
The simple answer is that C++ templates can do more than Rust generics can, including some things that in Rust you'd use the macro system for, and probably some things that you'd have to translate by expanding out the template and turning that result into Rust.
I recall seeing a presentation of c2rust by Andrei at a conference a while back. It's great to hear that it's making good progress. I'm curious at how good c2rust is at producing idiomatic rust code. It seems like that would be difficult and yet very valuable.
> we'd love to hear what you want to see translated next.
qemu [1] is all in C (probably C89 or "gnu89") and would make an interesting project, IMO.
I looked for the transpiled source, but I couldn't find it. I suspect it's not particularly idomatic Rust since it has to maintain all details of the C code however I think it's a very promising project anyway.
The fact that they found a memory safety bug is excellent. Maybe translating old C projects to Rust via this method can be used to find memory issues as well as to aid rewriting in Rust projects.
Their plan is twofold, in my understanding: first to translate the code fairly literaly, and then second, to provide unsafe -> safe refactoring tools. As far as I know they’re still on step 1.
Looks like it's meant to enable projects similar to how the Go compiler/toolchain was initially translated (with the like-named c2go) and then gradually rewritten over time into idiomatic golang. It's easy to see that this could be a much more tenable project then a grand rewrite that has two codebases iterating forward in parallel for a time.
The long tail of patterns c2rust would need to recognize to reliably make idiomatic code seems like a much larger problem. With that as the goal, it might make sense to build a separate rust-to-rust "idiomaticizer" that can recognize opportunities to reduce code complexity through refactoring, and apply idiomatic patterns without changing application behavior.
Maybe. Though there might be lowish hanging fruit that could potentially recognize arrays that could be translated to slices, and heap based allocations using pointer traversal that could be converted to Vec instead.
The real issue with doing that isn’t imo, going to Rust, it’s understanding what still needs to be addressable in C. For example, a self-contained application has no external ABI dependencies, and so could assume everything that references that slice must only need to be Rust. A library on the other-hand might still need to present a C ABI.
The generated code looks fairly straightforward, however I wouldn't call it "idiomatic".
What's exciting though is that (at least) both Rust and Zig get this ability to translate C code into the "native" language (and sometimes even finding bugs in the process). This basically turns C into a "meta-language" to create "bedrock" cross-platform API modules for a variety of modern languages, essentially the next step from "C as a cross-platform assembler".
And to be honest, this sort of "boring" low-level system API glue code which mainly talks to native C system APIs anyway isn't a joy to write in any language, and also doesn't benefit much from more modern language features, can just as well do it in C once, and then "transpile everywhere".
This seems pretty contrary to the purpose of Rust, since the generated code is full of non-idiomatic unsafe and possibly situationally unsound pointer manipulations that could definitely be done in idiomatic safe code. No reason I can see not to just write this sort of thing in Rust to begin with and expose it to other languages via the C ABI to call with FFI or even co-compile with other LLVM languages.
Yeah theoretically, but then you have the same problem as using C libraries from Rust, the build process for those other languages suddenly becomes much more complicated because they must integrate with another language toolchain.
Lets say I want to expose a Zig library to Rust or vice versa, now I must call the Zig compiler from Cargo, or Cargo from the Zig build process.
With a transpiled library such a project is "clean", it only needs the Rust or Zig toolchain. The transpiling process only needs to happen once, and this can happen automatically on a CI service when the base library is updated and then made available automatically through the language's package repository.
Rust probably won't get the ability to transpile to Zig, and Zig probably won't be able to transpile to Rust. Some sort of Lingua Franca makes sense in a multi-language world, and C (or even a subset of C) makes sense for that because it already is the Lingua France, and has a very small feature set to consider. I'm aware that this is a controversial opinion though ;)
If we were going to create a Lingua Franca for such a purpose from scratch, it would probably look very little like C. It would probably look like a much reduced subset of Some compiler IR, or Haskell. We would want something that is typesafe and lowers trivially to an AST. As the barest of table stakes, a language with any undefined or implementation defined behavior whatsoever is obviously unsuitable for this purpose. After all, we need a wide variety of compilers to be able to easily generate their own IR from it, and for them all to guarantee exactly the same behavior when they do. C seems close to the worst choice possible.
Edit: Haha, I realized I invented WASM again. This happens frequently...
The point of c2rust is to be a transitional tool, once the system is done in Rust, compiles with Rust tooling and works as it used to it's assumed it'll be much easier to manage and iterate on.
What is the ceiling for how good (i.e., "close to generating idiomatic Rust") it could get? Given the differing semantics, it seems like it should be nearly impossible to generate anything close to idiomatic Rust, but I know very little about mechanically translating between programming languages and I would be delighted to be incorrect.
I'd say "pretty low" on the idiomatic scale, however a very desirable thing would be for them to manage to make most of the generated code not unsafe. Limiting the scope of unsafe would make what would remain easier to convert to safe rust, and then you're off to the refactoring race with much more confidence.
A pretty good go would be the ability to recognise wide-spread high-level C patterns and be able to convert them to idiomatic Rust pattern, but even that would be difficult, generally speaking idiomatic C and Rust would likely be structured quite differently, especially on the data(-structure) side.
Although not automatic, I find Rust's clippy to be very good at suggesting improvements to code that compiles but simply is not written the way it should.
Not that Minecraft runs well or is a marvel of graphical technology but at least it proves you can do FPS 3D with the JVM enough to the the most popular game around.
I would think that with the newer GCs (like Shenandoah) for Java, the GC pause thing might become a non-issue soon enough. Not sure that works for Clojure though.
I kind of misspoke; Clojure has had difficulties in past supporting the latest versions of the JVM; it's possible that it works ok with Shenandoah...admittedly I haven't tried it yet.
Benchmarks would be interesting, in the sense that it would help make sure that the translation is doing the same thing, but it wouldn't say a ton about Rust vs C in a more general case. This translates to unsafe code, which means it's not really representative of Rust generally.
We didn’t benchmark this project (yet) but previous c2rust translations had approximately equal performance to the original. This is to be expected since the transpiled output is unsafe Rust that is equivalent to the original C code. One caveat is checked va unchecked array indexing, but unless an array indexing operation into a statically sized array is hot _and_ the Rust compiler can’t elide the checks, that’s unlikely to make much of a difference.
What is the realistic best-case scenario? Has anyone demonstrated that well-written Rust code should run faster than C, or is the hope simply to get performance parity with C while maintaining memory safety?
This topic is really tricky, and there's a lot of different ways to slice it. In the most general sense, if you take a C expert, and a Rust expert, and ask them to do something, they should be able to get the exact same results.
But that leaves out other interesting questions that inform this comparison. For example:
* What about the average Rust programmer vs the average C programmer? Maybe two experts can produce identical results, but does one language vs the other make it so that a non-expert tends to produce faster code?
* What about "no unsafe Rust other than what's in the implementation of the standard library"? Most Rust code doesn't use unsafe, so is it fair to let the Rust use it? How much?
* What about "no code other than the standard libraries." I may not be an expert, but Rust's package ecosystem makes it easier to use code that's developed by them. This is sort of a slightly different take on the two previous questions at the same time.
* Should there be a time limit on development? If so, should the C have the same requirements of demonstrating safety as the Rust, that is, sketching out an informal proof of correctness?
* What about some sort of "how maintainable is this" metric. Mozilla tried to implement parallel CSS in C++ twice, and couldn't do it, because the concurrency bugs were just too many. Their Rust attempt succeeded, thanks to the safety guidelines. Is this comparison valid? Or could some sort of person who was better at C++ have implemented it in theory with no bugs? (this is sort of question 1, but slightly different.)
Theoretically I would argue that Rust could get faster than C, because it can make stronger guarantees about the code that would allow further and much more complex optimizations of the code. You could argue that the set of possible outputs of a piece of C code is much wider than a piece of Rust code and you could compile the code down to some sort of state machine that equivalently represents the original code in a more compact form. The more well defined the outputs are, the more minimal the output representation could be.
I think it would be particularly interesting to compare idiomatic Rust and C code. For example, given C's lack of generics, I would expect Rust to have a number of wins, including better inlining opportunities for polymorphic functions a la qsort.
Yes, the most recent example I can think of is http://cliffle.com/p/dangerust/ in which the author translates heavily optimized C to Rust in a quite mechanical way. It comes out being slightly faster. In the last part they rewrite it in idomatic Rust and it ends up even faster again without any of the bespoke optimizations that the C code used or any unsafe Rust.
In many instances the Rust type system should allow the compiler to emit faster code because it has more data to use for reasoning about valid and invalid states of the program. Depending on how good c2rust is the Rust version should be roughly as fast, refactored to idomatic Rust it could be meaningfully faster than C.
Rust will outperform C consistently soon. It already does in some cases. It will also do so with safe idiomatic code in most cases. The reason behind my statement is simply more opportunity for the optimizer. Rusts rules allow for a more aggressive optimization. The reasons why it's not outperforming C yet are LLVM bugs and some missing Rust language features and fixes. LLVM in particular is driven by C needs so they rarely fix bugs present only Rustc generated IR.
This is grey, but shouldn’t be. I assume people are downing it because it doesn’t give examples of the optimization opportunities it claims. So here are two that I think are likely most significant.
1. Rust’s aliasing semantics are more powerful than C’s unless you use the ‘restrict’ keyword everywhere, which most people don’t. It’s well recognized that FORTRAN continues to generally outperform C in many numeric applications because FORTRAN compilers are free to perform optimizations that C compilers cannot or don’t, due to the fact that multi-aliasing memory is undefined behavior in FORTRAN. The rust compiler currently doesn’t pass the relevant hints to the generated LLVM IR due to a long history of LLVM having bugs in ‘noalias’ handling, in large part due to the fact that those code paths are so rarely executed while compiling C code, itself due to the relatively low usage of the ‘restrict’ keyword.
2. Implicit arena allocations. The Rust compiler has access to information about possible aliasing and overlapping lifetimes that it can use to replace multiple allocations with similar lifetimes with offsets into a single allocation that is then all freed together. This is a complicated topic, but work is ongoing to make this a reality.
2 might sound overly optimistic like "Rust will eventually beat C" but V8 already has an allocation folding optimization which combines multiple objects into a single site. There's also run time profiling to try and figure out the allocation lifetime.
I'm not disputing that. In theory Rust should outperform even today but of course reality is a different beast and implementation details are important.
Then you also get into architecture specifics etc.
dodobirdlord described the two major advantages Rust has on the language level very well so I won't repeat them, but the gist I'm making is that Rust simply allows more optimizations purely on the language level. Specifics of backend implementations are of course another topic completely (and is one of the reasons why I'm excited for the possibility of a GNU Rustc)
There are some "shoulds" related to aliasing I believe. Things the compiler can assume when compiling a rust program that it can't when compiling C, which should in theory let it generate more efficient code.
But last I heard Rust wasn't taking advantage of this due to issues with LLVM. The same benefits should also be achievable in C with pragmas.
> The same benefits should also be achievable in C with pragmas.
I don't know that any compiler provides a noalias-all pragma. I don't know that it would be very sane either since C very much defines pointers as aliasing by default. Plus you wouldn't have any way to revert this behaviour and mark pointers as aliasing.
> C very much defines pointers as aliasing by default.
This is actually a huge issue for the CRuby JIT which generates C from the VM bytecode. Alias analysis on the generated C code, which is full of indirection, is basically useless and severely restricts the possible optimizations.
Not answering directly your question, but I remember seeing a benchmark recently where there was more difference in performances for a program compiled with gcc and clang, than between the C++ clang and Rust performances.
So I believe that when you are that close in term of performances, you can assume that they all have the same performance, and that it should not be a decision driver whatsoever.
There's an example of the output and a discussion of this elsewhere in this thread. TL;DR: not as much as it could be, yet. This is a work in progress.
While it might be possible my guess is it would be much more difficult than C. AFAIK Rust is more or less a superset of C whereas Python and Ruby have numerous advanced features that aren't present in Rust and would likely require a bespoke Rust runtime to emulate the behaviour. Types is also a whole can of worms here.
I think in relation to Ruby, Python and other languages like them Rust's best application is leveraging the support for C FFIs to replace hot paths. While Rust has a significant learning curve it's probably much simpler, for say a ruby programmer, to pick it up and write safe performant code than C.
Technically it is possible. The gain won't be there for most cases, though as the generated code would be complex and bad as variables often have to be carried as "variant" types and you have to plug in to the library (which has a specific C API) and so on.
Facebook for some time did this with their "HipHop for PHP" transpiler which translates PHP into C++ and then fed it to gcc. That worked, but due to all the type conversion and similar magic required on all calls this lead them to compilation times of 24 hours and interfaces which were hard to interact with for language bindings, thus they went back to write a scripting engine (HHVM) which now runs their forked PHP.
Not usefully, since those languages require an extreme level of dynamism (even if rarely used by practical programs). E.g. Python requires that an object's methods are a dictionary that can be accessed and manipulated as one, so you'll never be able to translate Python objects into Rust traits. You could of course write a Python interpreter in Rust.
Why is there such a push to use this new language that's so inaccessible to much of the world? Not everyone has a multi-gigahertz, multi-core, multi-gigabyte computer to develop in Rust. It seems like unnecessary stratification for nebulous gain.
Perhaps the Rust team should make their compiler work on reasonable resources, like 32 bit processors, before marching off to try to convert the world.
Or perhaps we should have rust2c instead, so people can take all this Rust code that everyone is pushing and compile it on modest machines.
The compiler performance may be a perfectly valid reason not to use Rust. Don't mistake the frequent Rust articles for the whole world moving to Rust. It's more like a few excited pioneers exploring its use cases and doing foundational work. Besides, if it helps other people build more stable, secure, or better performing software tools for you to use, isn't that good, even if you don't use it directly yourself?
Unless you're talking about people holding on to hardware from 20+ years ago then yes - basically everyone with a computer, or a mobile phone has something that fits that classification.
People complain about Rust having long compile times, but I doubt it would be unusably bad even on something like a core 2 duo.
This is not entirely a rustc issue; LLVM is a huge part of this as well. Sometimes, some steps need more than four gigs of RAM. loc says that rustc has 1.4 million lines of Rust, and LLVM has 5 million of C++. It's just a huge project.
You would only need cross compilation of your project is as big as rustc, but if it were you would likely want the fastest available machine at your disposal and a lot of effort is poured into making cross compilation as painless as possible.
Making rustc compilable in a 32bits host might be a good idea, but it is far from a priority. If that is a deal breaker, as it is for OpenBSD for example, Rust is certainly a language that you should look at in the short term.
Full Build: time cargo build (1 min 42 seconds without download) (349K LOC total [0])
Recompile: time cargo build (2 seconds) (around 18k LOC)
I don't see the problem. The full build also includes all the 53 dependencies. So we are talking about a compilation speed of about 3.5k lines/s on a dual core 11" laptop from 2017 with turbo boost disabled.
I've personally worked on C++ projects where changing a single header can cause 5 minutes of compilation on an incremental build. Any slowdowns you have experienced may have been caused by a specific library or project.
[0] according to cloc ~/.cargo/registry/src/github.com-1ecc6299db9ec823/
Not everyone is a rich American or European, many people live in what would be considered a poor country. They might have a core2duo if very lucky. The entitlement on this website is getting out of hand.
First, rust provides safety checks you don't get in C. Second, the only way you'd improve compiler performance via C conversion would bypass these checks. Third, transpiling through C without these safety checks loses the point of rust in the first place.
The fact is, that there are a great many bugs in the wild that never would have happened in idiomatic rust in the first place. Yes, there is additional overhead to compiling. By comparison, if you use static code analysis on your C/C++ code there is additional overhead, and even then a greater chance of security bugs slipping by compared to rust.
In the end, used server hardware is relatively cheap in most of the world, which does pretty well with this type of code. There are a lot of people using mid-upper range x86 hardware and even to cross-platform compile to lower end arm64 support. At some point, it's less worth maintaining legacy hardware either for pragmatic and/or practical reasons.
Rust was started well after x64 and other 64-bit hardware was in the wild, and 32-bit hardware was for the most part no longer being produced. Supporting such systems is work that practically speaking is less valuable to most people.
As someone that really likes the old DOS based BBS software and related tech, and even supports Telnet BBSes today, I don't always like seeing this left behind to some extent. Last year, a large portion of the community blew up when 32bit was going to be dropped from Ubuntu, mainly because of the need required by wine and the gaming community.
This isn't about supporting 10+ year old apps and programs in an emulated environment... this is about porting existing code in an automated enough way to improve stability and support-ability over time by leveraging a safer language.
> Perhaps the Rust team should make their compiler work on reasonable resources, like 32 bit processors, before marching off to try to convert the world.
It's time to let 32-bit processors go and move on.
> The Rust compiler also flagged a similar but actually buggy example in G_TryPushingEntity where the conditional is >, not >=. The out of bounds pointer was then dereferenced after the conditional, which is an actual memory safety bug.