Hacker News new | past | comments | ask | show | jobs | submit login

In general, if Rust is significantly slower than equivalent C, it's a bug. We do care about bugs, and if you try some stuff and find them, please file them. (We track them via this tag: https://github.com/rust-lang/rust/labels/I-slow)

Embedded can be tricky at times; we've had a working group pushing on making stuff great all year. There's still a lot of work to be done. Sometimes you need to opt into nightly-only features. We'll get there!

(One major example is platform support: since we're built off of LLVM, we may not have a backend for more obscure architectures. ARM stuff works really well, generally. AVR is down to some codegen bugs...)




Are there theoretical boundaries that prevent Rust from ever reaching C performance or is this just a practical question of compiler optimization?


It really depends on exactly what you mean.

There's one significant thing that's in C that's not in Rust: alloca. If you need that, well, we've talked about it, but haven't accepted a design, and some people never want to gain it directly.

EDIT: another comment pointed out goto; I have no idea what the performance implications are but I’ll have to add that to my list of “stuff C has that Rust doesn’t.”

Sometimes, you need nightly-only features. This means that, in some sense, today's Rust is slower, but tomorrow's Rust may not be.

Sometimes, "equivalence” is the issue. You can turn off bounds checking for array access, for example, but most Rust code doesn't, for hopefully good reason. Sometimes those checks are elided, and so it's exactly the same. Sometimes they're not. Sometimes they're not and that inhibits other optimizations. Is unsafe Rust "equivalent" to the C? Or does that not count? (Unsafe Rust is not always faster than Safe Rust...)

At the theory level, I can't think of anything off the top of my head that would inherently make Rust slower than C, generally speaking. There's also some degree of argument that in theory, Rust should be faster than C, thanks to how much more we know about aliasing, etc. That's a whole other can of worms...


> Sometimes those checks are elided, and so it's exactly the same. Sometimes they're not.

This is something I've personally thought about a lot and I have no idea how I'd actually want it implemented either on the backend or in the syntax, but I think it'd be useful to have some way to say "I want these bound checks to be elided, if they can't be then make that an error". Similar to how you'd decorate a pure function in other languages to say that it can't cause side effects or depend on side effects for what it's doing.

I think that's probably a high level ask that's probably significantly more difficult that it seems initially too.


You probably just want a dependently typed language. Proving bounds at compile time is an introductory exercise for the likes of Idris.


Quite possibly that's what would work. I haven't tried any of them in any kind of serious capacity before.

Edit: Looks like there's been an RFC in the past to add it to rust, but it got punted because the const generics stuff went in first and they wanted to see how that would shake out before trying to fully tackle this. https://github.com/ticki/rfcs/blob/pi-types-ext-2/text/0000-...


ats [http://www.ats-lang.org/] is a C-level language with dependent types


Note that ATS has a limited version of dependent types as compared to Idris/Agda/Coq/etc. As I understand it, you can't run arbitrary (recursive, etc) functions at the type level. But even this 'lite' version lets you express preconditions like "n is a multiple of 4" or "the source array and destination array must not overlap in memory".


Yeah that'd be very interesting.


Can this sort of information be gotten out of LLVM?


> EDIT: another comment pointed out goto; I have no idea what the performance implications are but I’ll have to add that to my list of “stuff C has that Rust doesn’t.”

On that front, one interesting thought would be to reimplement the CPython bytecode interpreter in Rust and see if you can match the performance of C using computed goto.

In principle, there's no fundamental reason Rust couldn't optimize "loop around match" exactly the same way, without needing the computed goto. (For that matter, so could C.) Doing that would help cases like this.


Well, the answer should be "no" for unsafe Rust, since you can mechanically translate any C to unsafe Rust and vice versa.

For safe Rust, sure, there are limitations: you can't turn off bounds checks, for instance. But all Rust is partially unsafe Rust, because the standard library uses unsafe code ⃰. So it's a fuzzy distinction.

⃰ ⃰You wouldn't want it any other way. Implementing low-level functionality as unsafe code allows us to write code, not compiler-code-that-generates-code.


In a sense you can mechanically translate between any Turing complete languages, but there's not a "naive" translation, so to speak, because C can have irreducible CFGs and Rust cannot. It's not clear to me that reflowing a CFG can always be done without a performance impact.


You can convert an irreducible CFG to a reducible CFG with a loop and a switch case(/match in case of rust):

In pseudocode

  Goto = Start;
  While(Goto!=End) {
    Match(Goto) {
    ...
    }
  }
It’s a pretty basic obfuscation technique. Compilers will happily unroll this leaving you with an irreducible CFG.


You have more faith in the compiler than I do. Here's a stupid test I made.

* In C++, using goto: https://godbolt.org/z/YqCMbU

* In Rust, with relooped CFG: https://godbolt.org/z/IIXtKo

The compiler was obviously not able to unroll the relooped code into the original CFG.

A specific example: in principle, the compiler could see the exact target of every continue in the Rust code, so the continue on line 34 could go directly to line 27 (or, better, as in C++ the basic blocks could just be laid out adjacently), but the compiler does not actually do this and there are a bunch of unnecessary tests on the path between those two lines.


It's unfortunate that the enum-based solution doesn't work, but that particular example can be made "optimal" (at least, as good control flow as LLVM can get this code: the same as clang, not GCC), by using a rather unpleasant series of loops and labelled breaks: https://godbolt.org/z/sQ1nH8 (rustfmt'd version: https://play.rust-lang.org/?gist=b1091e0c583b88f27bf1eaeae3a...).

This doesn't work in general, and is ... impossible to maintain, but it's not an unreasonable approach for generated code.


Note that Rust has "forward goto" in the form of "labelled break" (which can even carry a value), so I suspect some cases might not even need converting to a loop-match "state machine".


True, I forgot about goto.


Not only true goto but also switch and computed goto. AFAIK C/C++ are essentially the only games in town for these. No modern language has even corrected the purely syntactic limitation that prevents nested switches (by eg. allowing you to attach label names to switch blocks and cases).


note that when you use iterators, bounds checks do get elided.


Not qualified to answer, but I did notice recently that Rust appears to generate more compiler code than equivalent C, but I don't think this realistically has a negative performance impact. You can check this out on godbolt (https://godbolt.org/)


It's possible to set compiler flags to optimize for size. That said, I've also seen large binaries, and monomorphization can contribute to that. Some projects are taking that more seriously, for example miniserde is designed to explicitly minimize monomorphization and this improve both code size and compile time.


Hadn't heard of miniserde before, awesome find, thanks!!


Bounds checking will inhibit vectorization, panicking will bloat code, maybe I'll make a comparison against a rust coreutils bin vs something like sbase.


Regarding performance: For my first foray into rust, I recently tried reimplementing a simple C++ tool in rust. It reads pcap data from stdin, finds the vlan tag for each packet and writes packets to one of a few different files (or stdout) depending on the vlan id.

I know C++ quite well, and I'm confident that the C++ version is reasonably well optimized, though it probably has a little room for improvement.

In rust, I think I'm doing things in a reasonable fashion but so far the performance is only half of the C++ version. So, not bad, but I was hoping it would be closer. I'd like to know if anyone has any suggestions for resources related to rust optimization.


You can use your usual tools, like perf, on Rust.

What’s the situation with the IO? Are you buffering? Are you holding the lock for a long time or rapidly locking and unlocking?

If you can share the code I can take a look.


I realized that stdout was acquiring and releasing a mutex on every write operation; fixing that improved things. The good news is that when writing to /dev/null, it's now faster than the C++ version, but writing to stdout is still ~25% slower. I suppose there's something else suboptimal about the way I'm using stdout, but I haven't figured it out yet.

Code is here: https://gist.github.com/usefulcat/56f334bc58c97edb073b457b68...

There is actually one other file but it only contains a couple of struct definitions for the libpcap file and packet headers. Thanks for having a look!


Update: I wrapped the locked stdout handle in a BufWriter and now the rust version is >25% faster than the C++ version in all cases.

I didn't think there was room for that much improvement; I'm really impressed.


If you have a blog, that would make for an amazing entry, going from 50% slower to 25% faster.

Edit: HOLY CRAP! I have a program written in Rust, ppbert[1], and I just tried wrapping my StdoutLock object in a BufWriter, and I improved the performance of my pretty-printing by a factor of 2x! I knew to use BufReader for files, I didn't know it was helpful for stdin and stdout! Thank you _so much_ for sharing your experience, I've certainly benefited!

    Benchmark #1: ppbert -2 *.bert2
      Time (mean ± σ):      3.816 s ±  0.115 s    [User: 2.494 s, System: 1.321 s]
      Range (min … max):    3.688 s …  4.028 s

    Benchmark #2: ppbert-dev -2 *.bert2
      Time (mean ± σ):      1.728 s ±  0.045 s    [User: 1.493 s, System: 0.234 s]
      Range (min … max):    1.678 s …  1.843 s

    Summary
      'ppbert-dev -2 *.bert2' ran 2.21x faster than 'ppbert -2 *.bert2'
[1] https://github.com/gnuvince/ppbert


Awesome! Glad you got it sorted. Those two things are always footguns...


Can you share the code? Are you using the `--release` flag to build/run it?


Yes I'm using --release.


Hey Steve, thanks for commenting here. As a person who is trying to get into embedded development but who knows very little, can you explain the challenges of having a language support most/all embedded platforms? I’m guessing you would need to support old versions of gcc that or something like this? Or be able to compile things with each platform special flags?

It seems like even C projects have a hard time being portable. People even avoid cmake because of the large dependency and the fact that it is cross platform until it isn’t.


The first issue is to have a compiler backend for that target. Someone has to write that. gcc supports more targets than LLVM.

Then, you have to make sure that it doesn't break; this means running on that target in CI, somehow. Given that you're already talking about devices that may not even have an OS... emulation can work sometimes?

Then, there's ecosystem stuff. You probably want some sort of HAL and support for not just one platform, but all of the platforms you're deploying to. So that's more work...

Finally, some platforms are proprietary and basically give you their own fork of gcc and so C is pretty much your only option anyway.


Not Steve, but do know or or two things about embedded and portable C code.

First of all, most embedded development makes use of bare metal, where the libraries take the role of an OS, or they use a specialized OS from the hardware vendor.

Just using pure ANSI C isn't possible, because the standard does not expose the hardware features from the underlying platform, so the alternatives are to use Assembly, or language extensions.

Naturally language extensions are more convenient to use, so that is what most developers end up doing.

Also there are many types of embedded platforms, you can be targeting anything between a tiny PIC with 8KB FLASH RAM to a powerful multi-core ARMv8-A with 8GB.

So the toolchain must allow for customization of what actually gets linked into the final binary, and the runtime must be as thin as possible.

Then there this the drivers story, each vendor gives you their own SDK, which most of the time is the only way to access their devices.

It is typical for open source projects to reverse engineer some of those SDKs to get the necessary information for linker maps, compiler flags and driver information.

Regarding Rust, there is an ongoing effort to create a embedded library for hardware drivers, as means to write portable code.

https://github.com/rust-embedded/awesome-embedded-rust


For some of the unsupported targets there is the Rust->C compiler. Haven't used it myself but I've seen that, for example, the ESP32/ESP8266/Xtensa is supported by it.


That's true, but the issue with it as a real development option is that its job was to bootstrap the compiler, and so it implements Rust 1.19 (I believe...), and nothing further. So you're missing out on a lot of useful stuff.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: