
Rust: Zero-Cost Abstraction in Action - i_dursun
https://idursun.com/posts/rust_zero_cost_abstractions_in_action/
======
bdd
These are not about abstractions but compile time optimizations. These are
also not implemented in the Rust compiler but LLVM. So any any language
frontend in front of LLVM would yield the same optimizations. According to
Wikipedia the list is:

> [...] variety of front ends: languages with compilers that use LLVM include
> ActionScript, Ada, C#, Common Lisp, Crystal, CUDA, D, Delphi, Dylan,
> Fortran, Graphical G Programming Language,Halide, Haskell, Java bytecode,
> Julia, Kotlin, Lua, Objective-C, OpenGL Shading Language, Ruby, Rust, Scala,
> Swift, Xojo, and Zig.

Sometimes I think mention of Rust in the title just gets upvotes without even
reading the article, here.

~~~
thcz
The C# compiler uses LLVM? Is that for something like Xamarin or something? I
was under the impression that it was self-contained. Anyone knows the details
of this?

~~~
bdd
Anyone can build a frontend. "Java bytecode" is in that list too. It doesn't
mean defacto compilers of these languages rely on LLVM.

~~~
jerven
In this case it is Azul Zing, a very serious product. It's one of the four
major VM Jit compilers (OpenJ9, C2, Graal are the others in my opinion)

[1] [https://www.azul.com/products/zing/](https://www.azul.com/products/zing/)

~~~
pjmlp
There are also PTC, Aicas, Virtenio, Ricoh and Gemalto all targeted to
embededded deployments, of which, PTC and Aicas are the most well known ones.

Sadly Excelsior is no more. I imagine that regular JIT compilers making
AOT/JIT caches available, is what killed them.

------
lasagnaphil
This isn’t really talking about Rust, it’s actually talking about the
optimization capabilities of LLVM (the compiler backend, which quite a lot of
languages use, such as Clang for C++, Rust, Swift, Julia, Zig, ...) These
languages all have similar chances of performing those same optimizations, at
least in those simple cases.

What I’m interested is how the IL code for the compiler frontends for each of
those languages are more well-optimizable for LLVM (in practical situations,
not just a few lines of simple numerical code.) I’ve heard that you need to be
careful about encoding IL code in the right way such that LLVM does not
generate needless memcpy’s or something but I’m not that much of a compiler
expert...

~~~
kibwen
rustc does indeed deliberately seek to emit LLVM IR that resembles what Clang
would emit, in order to benefit from the same sorts of optimizations that
default LLVM is tuned for.

------
tiziano88
There is nothing about zero const abstractions in this article, just basic
compiler optimizations.

~~~
kibwen
The OP doesn't explicitly demonstrate any zero cost abstractions, however
these "basic compiler optimizations" are only feasible to achieve statically
(i.e. without a JIT) in the presence of sufficiently transparent abstractions,
which takes a good deal of language design effort to enable.

------
SeekingMeaning
For anyone interested in reading more about this, I would recommend Zero Cost
Abstractions[1] by withoutboats, who is a contributor to Rust.

1: [https://boats.gitlab.io/blog/post/zero-cost-
abstractions/](https://boats.gitlab.io/blog/post/zero-cost-abstractions/)

------
drej
This is not Rust specific, this is a compiler thing, both LLVM and GCC can
detect sums and generate closed form formulas instead. There are other fun
algorithm detections - e.g. if you try to do bitcounts yourself, LLVM will use
popcnt instead.

Compilers are awesome, check out Matt Godbolt's talk on this very topic:
[https://www.youtube.com/watch?v=nAbCKa0FzjQ](https://www.youtube.com/watch?v=nAbCKa0FzjQ)

------
univerio
I assume it's doing `(N-2)(N-3)/2 + 2N - 3` instead of `N(N-1)/2` due to
overflow concerns? But couldn't `(N-2)(N-3)` also possibly overflow, just
supporting a larger range of `N`?

~~~
devit
In this assembly code it cannot overflow because N is a 32-bit integer and the
multiplication gives a 64-bit result, which is converted to 32-bit only after
shifting.

I can't figure out why it doesn't use the simpler formula (other than the
optimizer being bad).

------
Ericson2314
The earlier transformations are actually more impressive. Constant folding is
almost always has a good RoI, so lots of compilers do it, and it's simple
because, well, it's a constant there is no variables or partial eval needed.
The others require lots of inlining before the final rule fires, and so are
more ambitious.

Seeing what

    
    
      pub fn sum3(n: i32) -> i32 {
         (1..n).sum() + (1..2*n).sum() + (1..(n + 2)).sum()
      }
    

does would be more interesting to me.

Also, while all the inline is rustc, I assume the "triangle number trick" is
LLVM.

~~~
SAI_Peregrinus
Godbolt supports Rust, and can show the LLVM IR:
[https://godbolt.org/z/GsccW3](https://godbolt.org/z/GsccW3)

~~~
pjmlp
A bit offtopic, I love how Godbolt grew out of C++ community to embrace as
much AOT toolchains as possible, kudos to Matt and everyone involved into
making it happen.

------
ojosilva
I wonder if the const syntax introduced in ES6 will ever result in const
folding, or any JIT optimizations for code running in JS runtimes. Apparently,
from what I've read, const is being ignored as far as optimiztions go and is
only used as a way to prevent the developer from ever reassigning certain
variables.

------
pkilgore
The ocaml compiler does constant folding too, and I consistently look at the
(usually javascript because of Bucklescript) output in amazement when it finds
shit like that. Unrolling all my tail recursion is great too.

~~~
pjmlp
For anyone that wants to learn about optimizations in AOT compiled ML
languages, have a look at:

"The Implementation of Functional Programming Languages"

"Compiling with Continuations"

"Modern Compiler Implementation in ..." (C, Java and ML variants)

Although oriented towards Lisp, "Lisp in small pieces" is a classical book as
well, with many optimizations like inlining of lambda calls across multiple
call levels.

------
incadenza
Are compiler optimizations like summing a series done on an ad-hoc basis?
Certainly the compiler couldn’t have discovered or inferred (not sure what
term to use) that formula, no?

Just generally curious.

~~~
kibwen
Those optimizations are happening on LLVM's end (though the higher-level
language will have to expose sufficiently transparent abstractions to make
these optimizations possible). I'd love to read a book on the optimization
techniques that are implemented in LLVM/GCC to make transformations like this
possible.

~~~
incadenza
Gotcha. Thanks. So presumably Clang would have done the same for C?

~~~
steveklabnik
Yes, and it's mentioned in the post that the C code was also brought into
parity.

~~~
incadenza
Yeah I saw that, but just wasn’t sure how using intrinsics played a role.

------
boomer_joe
There is nothing impressive about this.
[https://godbolt.org/z/S2tDDh](https://godbolt.org/z/S2tDDh)

------
shmerl
_> he was very disappointed because Rust version was twice as fast than the C
version which was hand-optimised by pulling off all the tricks he knew to make
it perform well._

Why disappointed? It just highlights the quality of Rust's approach.

~~~
ncmncm
Evidently the tricks were what made it slow.

This happens all too often: once the code changes enough that the optimizer
doesn't recognize the pattern anymore, it throws up its hands, and you're on
your own. Some people call this optimizer roulette.

It's not just compilers, either. CPUs have their own peephole optimizers, and
patterns they recognize, or don't, and it can easily make a 2x difference in
your run time depending on if it cottons to what you're trying, or doesn't.

~~~
pjmlp
With CPUs it gets even worse, because that clever optimized Assembly code can
stop being so in another CPU or after a firmware update.

The days of Z80, 6502 and similar are long gone.

------
yahyaheee
This is neat but there is still cost in compile time

~~~
JoeCamel
Usually in Rust community zero cost means zero runtime cost. Obviously, there
are many other costs you could define.

