
Rust Performance: A story featuring perf and flamegraph on Linux - adamnemecek
http://blog.adamperry.me/rust/2016/07/24/profiling-rust-perf-flamegraph/?
======
dikaiosune
I'll be around on and off if anyone wants to discuss, although I hope it is a
fairly uncontroversial post. Since so many Rust folks (including myself) come
from non-native code, I think it's good to take time to document the discovery
process as it pertains to Rust and ideally make it easier for future
programmers to pick up.

~~~
the_duke
Great writeup.

~~~
dikaiosune
Thanks!

------
honkhonkpants
Bounds checking can be a nuisance in Golang as well. One trick I've picked up
is if I want to iterate over some small prefix of a slice, say the first 10
bytes of a slice of bytes that is known to be longer than 10 bytes, it is
tempting to do this:

    
    
      for i := 0; i < 10; i++ {
        something(honk[i])
      }
    

But if you do that, you get a bounds check on every iteration. Much faster to
do this:

    
    
      subhonk := honk[:10]
      for _, v := range subhonk {
        something(v)
      }
    

There's no range check inside the loop of a golang range iterator, for obvious
reasons.

~~~
dikaiosune
Rust does this as well with iterators and iterator adapters -- I just forgot
to include the version that did so in the blog post. Writing a loop like this:

    
    
        let mut count = 0;
        for b in &bwt[(i * self.k) + 1 .. r + 1] {
            if b == a {
                count += 1;
            }
        }
    

Gave me similar performance to the filter/count iterators, as both elide
bounds checking. I thought I mentioned that in the parting thoughts, but I may
be misremembering and I have to run.

------
kibwen
For further utility in benchmarking Rust programs, see cargo-benchcmp:
[https://github.com/BurntSushi/cargo-
benchcmp](https://github.com/BurntSushi/cargo-benchcmp)

------
conradev
Are perf and flamegraph the de facto tools for recording and viewing
performance?

I was looking for an alternative to Instruments (macOS) on Linux and I came
across callgrind and kcachegrind, which worked really well for me when
profiling some Rust code.

~~~
dikaiosune
I'm not sure what's de facto, but I most commonly see other Rust/Linux users
reaching for valgrind (callgrind, cachegrind, etc). I've recently used massif
(part of valgrind) for doing profiling of heap allocations, as well. In the
past I also used oprofile because it gave me a nicely annotated source dump of
sample rates in different statements.

I am growing to like perf a lot, though. It seems (haven't measured though) to
impose less runtime overhead, and the reporting and source annotation tools
are top notch.

