I'm amused that the author reached a satisfactory result of "only" 7 hours, just...

adgjlsfhk1 · on Jan 29, 2022

The biggest performance trap they had was copying all their strings in a really hot loop to a vector of characters. I'm not sure what we could do to steer people away from that...

udbhavs · on Jan 29, 2022

Do you have any examples where unoptimized rust is in the ballpark of interpreted Python? That seems extremely slow even for carelessly written code.

kibwen · on Jan 29, 2022

I've been teaching people to use Rust since 2011. It is quite common for someone coming from an interpreted language background to not realize that they have to manually ask for optimizations when building their code. In such circumstances they often come asking for help saying "I must be doing something wrong, I rewrote this Python code in Rust and it appears to be slower somehow?" For a long time on the /r/rust sidebar we even had a note saying "before asking for help, have you tried compiling with --release?" And indeed, once they turn on optimizations their code goes from as slow as Python to as fast as C. The OP's case is completely representative; a 14x improvement merely by compiling in release mode is entirely in line with what I have observed.

This may be surprising to people coming from C or C++, where unoptimized code is slower but not that slow. But the point of Rust is that it's selectively provided as many zero-cost abstractions as it can, which when paired with a modern sufficiently-smart backend like LLVM boils away those abstractions into nothing. It's a great feeling the first time that you crack open the assembly output for a highly abstract chain of iterator/closure chains only to find that the whole shebang has been reduced to a handful of fixed-form arithmetic operations without a loop or a function call in sight. But the tradeoff is that you do have to perform the step of boiling those abstractions away.

pjmlp · on Jan 29, 2022

This is not new problem actually, when IBM did their PL.8 research compiler for RISC, and safe systems programming, they took exactly the same approach as Rust, just in 1982.

Quite interesting paper, in case you haven't read it, "An Overview of the PL.8 Compiler"

https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.453...

Unfortunately despite its internal success they decided to go with Aix for RISC.

Someone · on Jan 29, 2022

I don’t think there are C compilers out there that do not optimize at all.

AFAIK, all of them will do some register assignment, some constant folding, etc.

For example, if you write

  void f(int a)
  {
    int b = a;
    b = b + b;
    g(b);
  }

are there compilers out there that compile that to

  - load a
  - store into b
  - load b
  - load b into another register
  - add
  - store into b
  - load b
  - call g
  - return

?

Languages that do overflow and range checks such as Julia and Rust in debug mode would have to add lots of them there if they do not do any optimization.

dralley · on Jan 29, 2022

I like Rust a lot but this is a big frustration with it. I have a hard time seriously imagining Rust taking off in game development when debug code is so bad, at least without some style guidelines that avoid the worst problems with it.

I'm glad Zig is providing some competition here that's more in line with what one would expect from C.

ReactiveJelly · on Jan 29, 2022

There is also a trick to compile your own program without optimizations, but optimize all dependencies.

I too had the "Un-optimized Rust code is slow!" experience when I started out, because my first project required PNG encoding and decoding, and the png crate was incredibly slow without optimizations.

But it's typical for C and C++ to be un-optimized at the level of `main`, but calling into heavily-optimized system libraries like ffmpeg.

kibwen · on Jan 29, 2022

One promising avenue of Rust development is that, while --release implies -O2 and --debug (the default) implies -O0, the -O1 case hasn't received a lot of attention. In the future it seems like the contexts you mention would benefit from such a sweet spot.

dralley · on Jan 29, 2022

One of the early advent of code challenges triggered this situation for many people (including myself). The same program compiled in release mode was more than 4000x faster than debug mode.

https://media.discordapp.net/attachments/386246790565068811/...

Although it's worth mentioning that LLVM understands math and might be replacing an inefficient sum calculation with an efficient algorithm, so it might not be all down to codegen.

IshKebab · on Jan 29, 2022

Debug Rust code is still probably 10 times faster than CPython.

espadrine · on Jan 29, 2022

That’s true, I didn’t realize the gap between debug and release Rust optimizers. Most of what I do in Rust is not sensitive to the extent that debug performance is an issue.

A tradeoff in the separate direction, though, is compilation performance. Currently the master branch takes 15 minutes to compile in release mode. It’s probably due to the hardcoded root choices, and I assume compilation is faster if they are loaded from a string instead of a structure, but it is a bit of a gotcha.

kibwen · on Jan 29, 2022

Surely you don't mean the code in the OP takes you 15 minutes to compile in release mode; it's a 100-line file with no dependencies, and no macros or generics to hide codegen behind, and can't take more than an instant to compile. To get to a 15-minute-long release build you'd either need a very large amount of code or you may be attempting extensive type-level programming that rustc was not designed to support. You may also want to try swapping out your linker; ditching the system linker for gold or lld can have dramatic results if you're bottlenecked on the link stage.

espadrine · on Jan 29, 2022

Not the code linked from the article; the goal of that code was to generate the scores of the guesses at the root of the search tree, so that I could hardcode them in.

The commit which hardcodes them in is the one that starts having slow compilation: https://github.com/espadrine/optimal-wordle/blob/2e71cb4ca46...

It is possible that there is a recommended way to do it differently which I missed. I tried lazy_static!, but ended up having to fight the type system, and the related GitHub issues didn’t bring me hope that I could overcome it easily.

kibwen · on Jan 29, 2022

Interesting, you appear to be hitting some kind of pathological case with the `vec!` macro. Apparently it doesn't like being used with a 15,000-line literal. :P Fortunately you're right, there's a different way to do this, which AFAICT doesn't suffer from the same pathology. I replaced this:

    pub struct Choice {
        pub word: String,
        pub avg_remaining: f64,
    }

    pub fn root_choices() -> Vec<Choice> {
        vec![
            Choice { word: "roate".to_string(), avg_remaining: 60.42462203023758 },

with this:

    use std::borrow::Cow;

    #[derive(Clone)]
    pub struct Choice<'a> {
        pub word: Cow<'a, str>,
        pub avg_remaining: f64,
    }

    pub const ROOT_CHOICES: &[Choice] = &[
        Choice { word: Cow::Borrowed("roate"), avg_remaining: 60.42462203023758 },

and it brought the time of `cargo clean && cargo build --release` down from 345 seconds to 13 seconds. I consider this a compiler bug, I'll file it if I can't find an existing issue.

espadrine · on Jan 30, 2022

Oh, thanks! That is exactly what I wanted; a global constant makes more sense for this use-case.

rixed · on Jan 29, 2022

> my ballpark for the performance of unoptimized Rust code is that it's roughly on-par with interpreted (not JITted) Ruby or Python

That's... very surprising; And it's hard to take this seriously unless you provide some justification for this claim.