
Rust Performance Pitfalls - blacksmythe
https://llogiq.github.io/2017/06/01/perf-pitfalls.html
======
agwa
> To get rid of the checks, we can either use bytes directly (usually via
> Vec<u8> / &[u8]) or, if we are absolutely sure the input will be valid
> UTF-8, use str::from_utf8_unchecked(_) (note that this will require unsafe
> and break your code in surprising ways should the input not be valid UTF-8).

I believe this needs a stronger warning. Functions that operate on strings are
allowed to assume that their input is valid UTF-8 and may perform out-of-
bounds memory accesses if the input is not valid UTF-8. Therefore, creating a
string containing invalid UTF-8 can lead to a memory safety vulnerability. The
suggestion to use this function seems out-of-place in this article, which
otherwise avoids suggesting unsafe code (e.g. indexing arrays without bounds
checking).

~~~
kibwen
What sort of language beyond "absolutely sure", "require unsafe", and "break
your code in surprising ways" would you recommend to make this warning
stronger?

~~~
Anderkent
'break your code in surprising ways' is super vague. Saying 'may cause out of
bounds memory access/writes should the input not be valid UTF-8' explicitly is
probably more scary.

~~~
steveklabnik
The nature of undefined behavior is in fact super vague; it's not actually
possible to say what will happen.

~~~
infogulch
While technically true, a specific example helps the reader conceptualize what
level of danger that means, when they don't already have a concrete
understanding like you do. Recall that the breadth of experience for rust's
userbase is much larger than e.g. C.

~~~
steveklabnik
Oh yeah, I mean, I'm not saying I disagree with adding detail, I can just see
the impulse to not, since anything you say may or may not be true.

~~~
infogulch
_me talking about rust 's userbase_

steveklabnik's profile: "Rust core team"

I'm guessing you know that a lot better than I do lol.

~~~
steveklabnik
It's all good! You're absolutely right, and it's easy to forget when you're
inside a bubble.

------
Lerc
I have been doing some exploration of how well Rust optimizes Iterators and
have been quite impressed.

Writing a iterator to provide the individual bits supplied by an iterator of
bytes means you can count them with

    
    
        fn count_bits<I : Iterator<Item=bool>>(it : I) -> i32{
            let mut a=0;
            for i in it {
                if i {a+=1};
            }
            return a;
        } 
    

Counting bits in an array of bytes would need something like this

    
    
        let p:[u8;6] = [1,2,54,2,3,6];
        let result = count_bits(bits(p.iter().cloned()));
    

Checking what that generates in asm
[https://godbolt.org/g/iTyfap](https://godbolt.org/g/iTyfap)

The core of the code is

    
    
        .LBB0_4:
            mov     esi, edx        ;edx has the current mask of the bit we are looking at
            and     esi, ecx        ;ecx is the byte we are examining
            cmp     esi, 1          ;check the bit to see if it is set (note using carry not zero flag)
            sbb     eax, -1         ;fun way to conditionally add 1
        .LBB0_1:
            shr     edx             ;shift mask to the next bit
            jne     .LBB0_4         ;if mask still has a bit in it, go do the next bit otherwise continue to get the next byte 
            cmp     rbx, r12        ;r12 has the memory location of where we should stop.   Are we there yet?
            je      .LBB0_5         ; if we are there, jump out. we're all done
            movzx   ecx, byte ptr [rbx]  ;get the next byte
            inc     rbx             ; advance the pointer
            mov     edx, 128        ; set a new mask starting at the top bit
            jmp     .LBB0_4         ; go get the next bit
        .LBB0_5:
    

Apart from magical bit counting instructions this is close to what I would
have written in asm mysef. That really impressed me. I'm still a little wary
of hitting a performance cliff. I worry that I can easily add something that
will mean the optimiser bails on the whole chain, but so far I'm trusting Rust
more than I have trusted any other Optimiser.

If this produces simiarly nice code (I haven't checked yet) I'll be very happy

    
    
       for (dest,source) in self.buffer.iter_mut().zip(data) { *dest=source }

~~~
jblow
Okay, wait. Every modern CPU has a popcount instruction, so any hand-coded
implementation would use that, meaning the compiler output is actually pretty
bad in an absolute sense.

But if you find popcount too "magical", the commonly-known fast way to count
bits is via masking, shifts and adds, so that you do it in log(n) steps. Which
also would perform much better than this solution.

So what you're really saying is "the compiler managed to make a pretty
efficient representation of the naive solution" which is fine but it does not
mean your code is fast.

~~~
kbenson
> Okay, wait. Every modern CPU has a popcount instruction

What do you consider a "modern CPU"? Atom chips sold less than a decade ago
didn't support popcnt. AMD shipped some C-series chips without support for it
as recently as 2012. The low-end chips were likely to be sold later without
refreshes to newer features in some cases, and those are also likely the ones
to be repurposed for small and cheap x86 devices later. I wouldn't want the
default compilation settings without specifying CPU extensions to use an
extension that might not exist on my target platform.

~~~
jblow
This is why you have a fallback to a generic version.

------
Analemma_
I'm not a compiler expert, but it seems like some of these should be
unnecessary, especially with Rust's strong knowledge of types and ownership
lifespans. Like this example from the article:

    
    
        let nopes : Vec<_> = bleeps.iter().map(boop).collect();
        let frungies : Vec<_> = nopes.iter().filter(|x| x > MIN_THRESHOLD).collect();
    

where he recommends avoiding the first collect(). Can't the optimizer do that
for you if you don't do anything else with nopes?

~~~
pornel
On a high level it seems like it could, but collect() performs heap allocation
(and size checks and reallocations for unknown-length iterators), and that's
probably too big side effect for LLVM to ignore.

~~~
radarsat1
Does current Rust contain a way to mark functions as Pure?

~~~
kibwen
Not in the referentially-transparent sense. Most Rust functions meet almost
all of the practical criteria for for purity (i.e. does not mutate global
state (or otherwise any state that was not explicitly passed in to the
function), does not do I/O), but that's only a comfort for the programmer's
ability to reason about the code; this weakened notion of purity-by-default
isn't enough to allow the typical optimizations via purity (e.g. memoization).

~~~
stcredzero
_but that 's only a comfort for the programmer's ability to reason about the
code_

A way to mark functions as _pure_ for this purpose would be great! Especially
if it's not as fraught as _const_ in C++.

~~~
steveklabnik
We actually did have this once, but it wasn't really worth it, so it was
removed.
[https://news.ycombinator.com/item?id=6940624](https://news.ycombinator.com/item?id=6940624)
is the HN discussion, but it looks like the link might now be wrong?

It was also a very, very long time ago, and so today's Rust might be different
enough that those reasons don't apply any more.

~~~
tormeh
The reason at that point vis that there weren't any practical benefits and
that it was preferable to wait for a more general mechanism. The first claim
is dubious, but I can empathize with second, as long as it doesn't become
tacked on.

Haskell can do cool optimizations that make it feel like magic sometimes.

~~~
Gankro
Note that a lot of these optimizations aren't necessary in Rust -- e.g. list
fusion is only valuable in Haskell because mapping over lists is exposed as a
one-shot operation that produces a new list. So `map map map list` is
conceptually building 2 whole lists of temporaries you don't care about (and
you often don't actually care about the last one either, in cases where you
just iterate over it and discard it).

Meanwhile mapping in Rust generally takes an iterator and produces another
iterator that will apply the given closure to the current element as it's
yielded. So the naive codegen for iter().map().map().map().collect() is
exactly what list fusion is trying to produce -- no temporary lists.

TL;DR: making "map" having the monadic `T[U] -> T[V]` signature is really
expensive. ¯\\_(ツ)_/¯

------
blinkingled
> for i in 0..(xs.len()) { let x = xs[i]; // do something with x } > should
> really be this:

> for x in &xs { // do something with x }

I am curious why the compiler can't rewrite the former to the latter?

~~~
pcwalton
Because in the former case, the optimizer has to prove that the length of the
array cannot change during the body of the loop, while in the latter case,
that's guaranteed by the language semantics.

~~~
mindleyhilner
Doesn't `let x = xs[i]` immutably borrow from xs? So for the duration of x's
lifetime (which is the entire for-body block), xs cannot be changed and
therefore its length must remain the same.

Though this might be information that rustc knows about but not LLVM.

~~~
pcwalton
In the case of "let x = &xs[i]" it would immutably borrow. But we'd need more
MIR optimizations to make use of that fact.

------
smitherfield
_> Sometimes, when changing an enum, we want to keep parts of the old value.
Use mem::replace to avoid needless clones._

    
    
        use std::mem;
    
        enum MyEnum {
            A { name: String, x: u8 },
            B { name: String }
        }
    
        fn a_to_b(e: &mut MyEnum) {
    
            // we mutably borrow `e` here. This precludes us from changing it directly
            // as in `*e = ...`, because the borrow checker won't allow it. Therefore
            // the assignment to `e` must be outside the `if let` clause. 
            *e = if let MyEnum::A { ref mut name, x: 0 } = *e {
    
                // this takes out our `name` and put in an empty String instead
                // (note that empty strings don't allocate).
                // Then, construct the new enum variant (which will 
                // be assigned to `*e`, because it is the result of the `if let` expression).
                MyEnum::B { name: mem::replace(name, String::new()) }
    
            // In all other cases, we return immediately, thus skipping the assignment
            } else { return }
        }
    

Don't get me wrong, I think Rust is an incredibly impressive language, but
this is _nuts._ For the first time ever, I had a moment of appreciation for
the "simple clarity" of C++11 move constructors. If it weren't for the
comments and the documentation[1] I wouldn't have had the slightest clue what
this code is doing (a hack to fool the borrow checker while allowing for
Sufficiently Smart Compilation to a no-op[2] ... I think).

This is a good example of the main conceptual aspect of Rust where I feel it
could use improvement.[3] A lot of its features marry very high-level concepts
(like algebraic data types) to exposed, low-level implementations (like tagged
unions). Now, there's nothing wrong with the obvious choice to implement ADTs
as tagged unions, but the nature of Rust as a language that exposes low-level
control over allocation and addressing, in combination with the strictures of
the borrow checker, means enums and other high-level features live in a sort
of uncanny valley, falling short of either the high-level expressiveness of
ADTs or the low-level flexibility of tagged unions (without expert-level
knowledge or using `unsafe`).

Similarly, the functional abstractions almost feel like a step backwards from
the venerable C++ <algorithm>, abstraction and composition-wise. Very nitty-
gritty, low-level implementation details leak out like a sieve — you can't
have your maps or folds without a generous sprinking of `as_slice()`,
`unwrap()`, `iter()` / `iter_mut()` / `into_iter()` and `collect()`
everywhere, although at least for this case you would be able to figure out
the idioms by reading the official docs. But nevertheless it seems like
reasonable defaults could be inferred from context (with the possible
exception of `collect()` since it allocates), while still allowing
explicitness as an option when you want low-level control over the code the
compiler generates.

In this case, enums are a language-level construct, not a library, so the
borrow-checker really shouldn't be rejecting reasonable patterns. It (legal to
alias between the structural _and_ nominal common subsequence of several
members of an enum) should be the default behavior and not require an
nonobvious, unreadable hack such as the above. At the very least that behavior
(invaluable for many low-level tasks) should be easy to opt in to with e.g. an
attribute.[4]

[1] [https://doc.rust-lang.org/std/mem/fn.replace.html](https://doc.rust-
lang.org/std/mem/fn.replace.html)

[2] Or rather since it is a tagged union, in MASMish pseudocode something like

    
    
            jnz x_nonzero
            mov MyEnum.B, e.tag
        x_nonzero:
            ret
    

Although it could still be a no-op depending on what else Sufficiently Smart
Compiler/LLVM inlines.

[3] And I'm not saying I have the solution or even that all-around-better
solutions exist.

[4] Something like

    
    
        #![safe_alias_common_subsequence(structural)]
        #![safe_alias_common_subsequence(nominal_and_structural)]

~~~
pcwalton
One problem with what you propose (as I understand it) is that it's not safe
to have uninitialized data in the presence of panics. A panic can cause
control flow to unwind and destructors to be invoked, and if you have
partially uninitialized data you have a recipe for use after free. In C++ you
just get use-after-free hazards everywhere (which is one of the many reasons
why exception safety in C++ is extremely hard).

Another problem with your proposal is that it's not safe to access the
structural and nominal common subsequence of several members of an enum in the
same way, because the Rust compiler can and will reorder the fields
differently in different variants in order to fill padding.

That said, it sounds like what you're looking for is one of the various
inheritance proposals, which have safe access to common enum fields as one of
the main guarantees. This would make this pattern much more ergonomic.

~~~
smitherfield
Ah, I wasn't aware that Rust reorders structure members. Although I still feel
it should be possible, whether the compiler just needs to change the tag or
has to shift data around.

What I was proposing was that common subsequences of enum fields be treated by
the compiler as synonyms, but accesses to uninitialized data would still be
illegal. One way this might be implemented is by, when necessary, creating
additional tags behind the scenes, e.g. MyEnum::__AB__ meaning A was
initialized but B is the active member, so only accesses to their common
subsequence are legal, MyEnum::__BA__ meaning the reverse, and so forth. Or it
could be restricted to fields that contain no members outside their common
subsequence with nontrivial destructors. Or the compiler could null out any
uninitialized references to such.

 _> That said, it sounds like what you're looking for is one of the various
inheritance proposals, which have safe access to common enum fields as one of
the main guarantees. This would make this pattern much more ergonomic._

Interesting. Link?

~~~
pcwalton
> What I was proposing was that common subsequences of enum fields be treated
> by the compiler as synonyms, but accesses to uninitialized data would still
> be illegal. One way this might be implemented is by, when necessary,
> creating additional tags behind the scenes, e.g. MyEnum::__AB__ meaning A
> was initialized but B is the active member, so only accesses to their common
> subsequence are legal, MyEnum::__BA__ meaning the reverse, and so forth. Or
> it could be restricted to fields that contain no members outside their
> common subsequence with nontrivial destructors. Or the compiler could null
> out any uninitialized references to such.

I feel like that's too much hidden behind-the-scenes magic for a systems
language, and I suspect the majority of the Rust community would feel the same
way, but feel free to file an RFC.

> Interesting. Link?

[https://github.com/rust-lang/rfcs/issues/349](https://github.com/rust-
lang/rfcs/issues/349)

------
jancsika
> With some investment into optimizations, matching or exceeding C’s speed
> should be possible in most cases.

What class of algorithms is faster in Rust than it is in C?

~~~
pornel
I don't think it can be generalized like that.

Theoretically, Rust's memory ownership/aliasing rules are stricter and more
granular than C's restrict, so some pointer-heavy code could optimize better.
Rust is very good at inlining by default. Rust makes it easy to use stack-
allocated structures. But C _can_ do the same, it's just a matter of effort.

Both languages are low level enough that you can use them as a portable
assembly and endlessly tweak them to one-up the other. If some C code doesn't
optimize well you can write a more contorted code that will.

~~~
Gankro
Yeah the only thing that Rust might have over C, in terms of really optimized
implementations, is low-level idioms that C declares to be UB, but Rust
declares to be defined (generally to be what x64 hardware does).

Maybe something that leverages signed overflow, overlong shifts, or type
punning.

But if you have enough control over your codebase to mandate the
compiler/flags its built with, then you can generally tell the major C
compilers to act like Rust in these cases.

That said, the expected win for Rust over C(++) _in practice_ is that you can
be more "reckless", because you have a stronger type system protecting you
from messing things up. A production-quality C(++) codebase might rightly do
more copies, use more reference counting, and use less concurrency just
because the risk of doing otherwise isn't worth the potential performance
wins.

Organizations have limited resources to commit to optimizing/verifying code.
Rust is intended to get you more bang for your buck.

~~~
duneroadrunner
> That said, the expected win for Rust over C(++) in practice is that you can
> be more "reckless", ...

I'm glad to see somebody articulate this observation. SaferCPlusPlus[1] is
meant to, in part, bring this benefit to existing C++ code bases. The question
is, would a borrow checker for C++ make sense?

[1] shameless plug:
[https://github.com/duneroadrunner/SaferCPlusPlus](https://github.com/duneroadrunner/SaferCPlusPlus)

------
mcguire
" _Similarly, the Read::lines() iterator is very easy to use, but it has one
downside: It allocates a String for each line. Manually allocating and reusing
a String will reduce memory churn and may gain you a bit of performance._ "

Back when Rust used "internal" iterators, this could be the default. Now, you
can encapsulate it by passing a function to handle each line.

