Hacker News new | comments | show | ask | jobs | submit login
Rust Performance: A story featuring perf and flamegraph on Linux (adamperry.me)
111 points by adamnemecek 301 days ago | hide | past | web | 10 comments | favorite

I'll be around on and off if anyone wants to discuss, although I hope it is a fairly uncontroversial post. Since so many Rust folks (including myself) come from non-native code, I think it's good to take time to document the discovery process as it pertains to Rust and ideally make it easier for future programmers to pick up.

Great writeup.


Bounds checking can be a nuisance in Golang as well. One trick I've picked up is if I want to iterate over some small prefix of a slice, say the first 10 bytes of a slice of bytes that is known to be longer than 10 bytes, it is tempting to do this:

  for i := 0; i < 10; i++ {
But if you do that, you get a bounds check on every iteration. Much faster to do this:

  subhonk := honk[:10]
  for _, v := range subhonk {
There's no range check inside the loop of a golang range iterator, for obvious reasons.

Rust does this as well with iterators and iterator adapters -- I just forgot to include the version that did so in the blog post. Writing a loop like this:

    let mut count = 0;
    for b in &bwt[(i * self.k) + 1 .. r + 1] {
        if b == a {
            count += 1;
Gave me similar performance to the filter/count iterators, as both elide bounds checking. I thought I mentioned that in the parting thoughts, but I may be misremembering and I have to run.

Good to know this. Some reason I felt in my mind that the first version would be faster because I am providing more information...

For further utility in benchmarking Rust programs, see cargo-benchcmp: https://github.com/BurntSushi/cargo-benchcmp

Are perf and flamegraph the de facto tools for recording and viewing performance?

I was looking for an alternative to Instruments (macOS) on Linux and I came across callgrind and kcachegrind, which worked really well for me when profiling some Rust code.

I'm not sure what's de facto, but I most commonly see other Rust/Linux users reaching for valgrind (callgrind, cachegrind, etc). I've recently used massif (part of valgrind) for doing profiling of heap allocations, as well. In the past I also used oprofile because it gave me a nicely annotated source dump of sample rates in different statements.

I am growing to like perf a lot, though. It seems (haven't measured though) to impose less runtime overhead, and the reporting and source annotation tools are top notch.

It's important to note that perf is a sampling profiler using the CPU performance monitoring unit, whereas callgrind (and all the valgrind tools, in general) uses processor emulation to record every event.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact