
Cheap tricks for high-performance Rust - ranadeep
https://deterministic.space/high-performance-rust.html
======
ciprian_craciun
Although the article lists all the usual performance "knobs", I think it
should have put more emphasis on more "hidden performance hogs", that (as the
author passingly mentions) can only be discovered through profiling.

Moreover it also fails to mention that having all these enabled does have its
own disadvantages:

* the build time goes through the roof; (and if you also disable incremental builds, the larger the code base, the more it takes to compile even for a single line edit;) * just by the fact that more code gets inlined, it might reduce performance due to the inability to cache enough executable code in the CPU low-level caches;

\----

As for profiling, I would say it's a much better "bet" when it comes to
reducing execution time, than these build "knobs".

For example a few months ago I re-wrote a small Go tool in Rust
([https://github.com/volution/volution-
md5-tools](https://github.com/volution/volution-md5-tools)) which had the
simplest of jobs: read two MD5 (or similar) files, and compare which files are
missing, which have changed, etc. Basically populate two maps and compare
them.

Now, initially the code I used was almost a 1-to-1 rewrite using simple hash-
maps, and to my surprise the Go version was twice as fast than even the Rust
release version (with all of the mentioned performance tricks).

So digging through the code and profiling it here were a few surprising
findings on what plagued my execution time (in order of surprise):

* deallocation -- after the Rust program has finished it proceeded to deallocate the two large hash-maps, which by itself took (if I remember correctly) at least 25% of the time; (Go didn't have this issue, as being garbage collected, the collector didn't kick in...) (the solution: use an `exit(0)` to make sure the deallocation doesn't happen;)

* `PathBuf` equality comparison -- I've used `PathBuf` as keys in my hash-map (because I wanted to canonize the paths); however the key comparison took another large percentage, which was solved by switching to `OsString`; (for some reason, comparing two `PathBuf`'s, implies splitting them each time in components, and comparing those one by one;)

* regular expression matching with groups -- apparently it's far more expensive to use regular expressions with groups for parsing, than just using them to "verify" the validity of the syntax, and then switch to another technique to actually tokenize;

So, as the saying goes "caveat emptor"... :)

