
Generics and Compile-Time in Rust - Bella-Xiang
https://pingcap.com/blog/generics-and-compile-time-in-rust/
======
the_duke
There are plenty of Rust features I'd love to see finalized, like GAT ( HKT,
sort of), generators, async trait methods, custom test frameworks, ...

But there is an area that could have a big impact on certain (mostly higher
level) domains, yet doesn't seem to get much attention: better trait objects.

They are severely limited in a few aspects:

* only a single trait/vtable

* casting is only available with Any and you can't cast between different traits, requiring really awkward super-traits with manual conversion methods or hacks like mopa [1]

* object safety rules are cumbersome and prevent certain important traits like Clone to be available, leading to clone_boxed, clone_arc everywhere, or proc macro solutions like dyn-clone [2]

* ...

Doing anything more fancy with them usually feels annoying. Therefore the
standard library and entire ecosystem strongly favor generics and
monomorphization.

This is generally fine and has worked out well for the language, but there are
plenty of use cases where more advanced trait objects could reduce code size
and compile times with very small impact on performance, while also enabling
some interesting new patterns.

I realize there are plenty of implementation challenges that make work in this
area far from trivial in the current language, but it's frustrating to miss
out on part of a toolbox.

I think Swift is an interesting comparsion. The languages are similar in quite
a few aspects, but Swift often prioritizes small code size and dynamic
dispatch over monomorphization. Compile times aren't that great either,
though...

ps: it is briefly mentioned in the post, but switching to LLD has provided
noticeable build time improvements on most of the binary crates I am working
on.

[1] [https://github.com/chris-morgan/mopa](https://github.com/chris-
morgan/mopa) [2] [https://github.com/dtolnay/dyn-
clone](https://github.com/dtolnay/dyn-clone)

~~~
saghm
> This is generally fine and has worked out well for the language, but there
> are plenty of use cases where more advanced trait objects could reduce code
> size and compile times with very small impact on performance, while also
> enabling some interesting new patterns

To elaborate a bit on the part about new patterns, I've encountered issues
where trait objects allow me to define APIs that otherwise wouldn't be
possible. One example is when developing something like a database driver, you
might define a trait EventHandler with the methods `handle_start_event` and
`handle_completion_event`, where users can pass in values of types that
implement EventHandler, and you call their handling methods whenever you start
or complete an operation. The most straightforward way to specify this would
be to have your client type have an internal vector of EventHandlers that you
can iterate over and call the corresponding event handling methods whenever
needed. If you use generics for this, then your client type will need to be
generic over the type of EventHandler it can contain, which means you can't
specify EventHandlers with different concrete types. The best way to get
around this is to use something like `Vec<Box<dyn EventHandler>>`.

I understand the sentiment behind favoring generics over trait objects in the
Rust ecosystem, as strongly preferring compile time costs to runtime costs
when there's a choice between them is one of the more fundamental guiding
principles to the Rust ecosystem (and is one of the things I really like about
Rust), but there are patterns that trait objects allow that just can't be
expressed with generics, and when you need to use one of them, it can be
frustrated to hit some of those sharp edges you mentioned.

~~~
kccqzy
> The best way to get around this is to use something like `Vec<Box<dyn
> EventHandler>>`.

That's exactly the right solution to your problem. It's not a workaround. It
_is_ what you need.

What you need is a heterogenous collection of objects that implement a common
interface. And the exact type isn't available to you because it's provided by
the client. So you need to go through virtual (or dynamic) dispatch.

Just imagine how you might write this in C++: you just define an EventHandler
abstract class and ask that clients inherit from it. Then you take pointers to
EventHandler and store them.

Or imagine how you might write it in Haskell: simple existential types. The
type class dicts are stored by the compiler into your data type to enable
dynamic dispatching.

This is object-oriented polymorphism 101. I think Rust's anti-OO pendulum has
swung so far that people can't even see that what they need is basic OO.

~~~
gpderetta
It is not even OO, as the Haskell example shows.

But I understand completely the mindset. If you think that you can implement
your solution with full static dispatching and no pointer indirection, having
to add these "ugly workarounds" feels like surrendering.

But there is a continuum between full static dispatch and boxing and
indirection everywhere. Adding a few carefully chosen "dynamic joints" can add
a lot of runtime flexibility at a minimal cost.

------
riquito
I love when authors add how to read aloud something to help newcomers, as in

    
    
      fn print<T: ToString>(v: T) {
    

> We say that “print is generic over type T, where T implements Stringify”

~~~
seaish
Does anyone actually call it "Stringify"? I would call it "to string".

~~~
smabie
Probably not. If they wanted people to call it Stringify then just name it
stringify. Though personally, I like how instead of toString, Haskell calls it
'show'. Clear, to the point, and one word.

~~~
carlmr
How is "show" clear for string conversion? You could just as well expect a GUI
popup just going from that word.

~~~
smabie
How is toString, you just might expect your object to turn into a ball of
yarn.

~~~
carlmr
String is a very common but specific word in programming for a string of
characters.

Show is a very common but generic word in programming, for showing plots,
showing message boxes, showing anything really. Not knowing Haskell I would
not have expected it to convert an object to a string.

------
twoodfin
This is a good article, but rather misses the point on performance of
monomorphization vs. dynamic dispatch. Yes, CPU indirect branch predictors are
getting better, and compilers are getting smarter about identifying
opportunities to turn dynamic into static dispatch. But inlining remains the
optimizer’s silver bullet, enabling a host of dependent optimizations. It’s
those further optimizations that make the primary performance difference for
static calls.

~~~
AndrewBissell
I did some investigation into the performance impacts of using the _Generic
keyword in C11. It was a toy example so not necessarily broadly applicable to
real world code, but a good 70 or 80 percent of the speedup from generics
seemed to come from inlining.

[https://abissell.com/2014/01/16/c11s-_generic-keyword-
macro-...](https://abissell.com/2014/01/16/c11s-_generic-keyword-macro-
applications-and-performance-impacts/)

~~~
vlovich123
I think an area ripe for improvement is compilers to think about which
monomorphized variant is valuable & can benefit from that inlining vs where
the dynamic dispatch is sufficient. Start with PGO & I'm sure smarter people
will investigate how that can be done for 80% of the benfit at compile time
without PGO.

------
marcianx
> first, modern CPUs have invested a lot of silicon into branch prediction, so
> if a function pointer has been called recently it will likely be predicted
> correctly the next time and called quickly

Huh, TIL. Branch prediction is normally about predicting which branch an `if`
would take. But apparently this applies to indirect jumps as well:
[https://stackoverflow.com/a/26240197/1082652](https://stackoverflow.com/a/26240197/1082652)

~~~
xenadu02
This also applies to predicting the return address. You can assume any form of
control flow has branch prediction. A surprising amount of silicon ends up
being worth the cost if it can improve prediction just a little bit because a
pipeline stall is so astronomically expensive.

------
rob74
A true gem from the "comments on the last episode":

> _The compile times we see for TiKV aren 't so terrible, and are comparable
> to C++_

So if you're already used to the terrible compile times of C++, the compile
times of Rust won't seem that bad in comparison. And Mozilla, where Rust
started, mostly relies on C++. That does explain a lot...

~~~
Ygg2
That, and the more you optimize for runtime performance, the less compile-time
performance you have.

~~~
Gibbon1
I always keep coming back to code gets compiled a lot more than it gets
modified. In C land it's typical to run analysis tools on a codebase as a
separate process. Seems doing the sorts of checks Rust does on every compile
seriously wastes programmers time.

Also a problem that for 95% of code written speed doesn't matter at all.

Combine those two and programmers a paying a lot for something that isn't
needed. AKA 99% of the time you're compiling a module that hasn't changed. And
95% of the code in those modules speed doesn't matter at all.

~~~
lmm
Rust is a language specifically for the other 5% though. If you don't have
extreme performance requirements wouldn't you just use OCaml (which compiles
very quickly)?

~~~
Gibbon1
Of late I feel like I care less about perfect languages and more about ABI
compatibility. There is something to be said about most of the time you should
be using a safe mildly performant glue language with a fast write/compile/test
cycle. The big monkey wrench of course is ABI incompatibility. Which has
always left you with C and now Swift as the base language.

------
platz
> In general, for programming languages, there are two ways to translate a
> generic function:

> 1\. translate the generic function for each set of instantiated type
> parameters

> 2\. translate the generic function just once, calling each trait method
> through a function pointer (via a “vtable”).

The approach in haskell might be considered a variation of 2, since it
involves indirection, but a little different from other languages normally
using vtables, since it's not selecting different implementations at run time,
but just looking up the pre-determined implementation through a new parameter.

In particular, the function is transformed into a higher-order function
accepting a new parameter representing the dynamic to_string functinoality,
then at the call site, the appropriate concrete implementation to_string
parameter is inserted for the new transformed function, and similarly this new
higher-order function 'print' only needs to be compiled once.

~~~
kccqzy
How is a Haskell type class dict different from a vtable?

It's in fact the same.

~~~
tome
Hmm, not sure what you mean. In the body of a function there is one class dict
available. On the other hand there is one vtable _per object_ , typically.

~~~
ImprobableTruth
I don't know of a single language that has a vtable per object. What would
even be the point?

Edit: Assuming we're talking about objects that can be grouped into some sort
of 'classes'.

~~~
gpderetta
Isn't python __dict__ pretty much a per-object vtable?

------
amw-zero
Once all of this compile-time metaprogramming and code execution starts to
happen, it always makes me ask: doesn’t this just conclude with dynamic
typing? I’m currently a static typing lover. But it’s almost as if we’re just
looking for the full power of a programming language. Why not just use the
language itself instead of a weird, stratified compile time language?

~~~
josephg
Not necessarily. The cleanest version of this sort of thing I’ve seen is zig’s
comptime. Basically you can annotate loops, conditionals, function parameters,
etc to be evaluated at compile time instead of runtime. You’re still writing
zig - it’s the same language; just some of your code runs in the context of
the compiler.

One really nice use of this is printf. In C, printf needs a remarkably large
amount of code and it’s quite slow. In zig, printf marks the format string as
comptime. Then the _compiler_ runs the code looping over and evaluating the
format string, then unrolls the result into the binary. The output ends up
being a few terse calls to write / format with no extra machinery at runtime.
And you get clean compile errors if the format string doesn’t match your
arguments - all with no special casing in the compiler.

More detail, with code: [https://ziglang.org/documentation/master/#Case-Study-
printf-...](https://ziglang.org/documentation/master/#Case-Study-printf-in-
Zig)

I think nim has something similar.

I’m surprised rust went with macros instead of something like comptime -
though I assume there’s some trade offs I’m not aware of.

~~~
akiselev
_> I’m surprised rust went with macros instead of something like comptime -
though I assume there’s some trade offs I’m not aware of._

Rust has procedural macros which are Rust libraries built and run compile time
that can transform raw tokens into valid Rust code (or more raw tokens for
nested macros). Procedural macros can annotate any statement or code block -
although some specific ones might still be on nightly only.

~~~
_nhynes
crate-level inner attribute macros are still unsupported [0], so if you want
those, you'll need to do the codegen before the compiler takes a pass or from
within the compiler itself (using, say an after_parsing [1] hook). They're
really helpful for making a deeply embedded DSL.

[0] [https://github.com/rust-lang/rust/issues/54726](https://github.com/rust-
lang/rust/issues/54726)

[1] [https://doc.rust-lang.org/nightly/nightly-
rustc/rustc_driver...](https://doc.rust-lang.org/nightly/nightly-
rustc/rustc_driver/trait.Callbacks.html#method.after_parsing)

------
saagarjha
Can we have the title changed to "Generics and Compile Time in Rust"? The way
it's written know, I thought for sure it would be about compile-time
programming using generics.

------
estebank
I am convinced that `rustc` should have an "optimizer lint" phase with all the
same checks other languages perform to change the behavior of the code and
instead suggest changes that affect running code and compile time, like
`Box`ing fields or variants that disproportionately affect the size of an
`enum`[1] or to suggest changing generic params to trait objects or vice
versa[2] when it makes sense.

The advantage of not doing it automatically is that the behavior of the code
once compiled can be _always_ inferred from looking at the code, no magic and
sudden changes in behavior because some threshold has been passed in some
optimizer.

[1]: [https://github.com/rust-lang/rust-
clippy/pull/5466](https://github.com/rust-lang/rust-clippy/pull/5466)

[2]: [https://github.com/rust-lang/rust-
clippy/issues/14](https://github.com/rust-lang/rust-clippy/issues/14)

------
baby
Rust compile times are just really really bad.

------
blackrock
Nim or Rust?

Some Rust syntax seems overly confusing. But Nim doesn’t really have strong
corporate support backing it.

But, both are still new, and missing a lot of libraries.

Rust is annoying, where there aren’t standardized libraries for common
functions. Just some guy’s tweet to use some random cargo crate from someone.

------
ed25519FUUU
> * Note that in these examples we have to use inline(never) to defeat the
> optimizer. Without this it would turn these simple examples into the exact
> same machine code. I'll explore this phenomenon further in a future episode
> of this series.*

I’m really eager to use more rust, but three optimizations really turn me off.
Optimizing the compiler feels like meta programming.

~~~
prerok
What do you mean? I think that these statements are there only so that the
author can provide a glimpse into the difference of code generation between
the two examples. You would not want to disable it when writing code for an
actual project.

