
Why Rust closures are somewhat hard - fanf2
https://stevedonovan.github.io/rustifications/2018/08/18/rust-closures-are-hard.html
======
Animats
Rust decides at compile time how long the data stored in a closure must live.
You can create situations where that's hard to define, and the compiler,
properly, won't permit that. It's very clever.

(Rust tends to use closures heavily, in places other languages don't. So they
need all the performance they can get.)

~~~
glittery
One thing that's always concerned me about Rust is that everyone mononorphizes
everything (e.g. closures are always unboxed except in the rare cases where
dynamic dispatch is necessary). Hopefully this is a baseless fear, but isn't
this a recipe for executable bloat?

~~~
withoutboats
It can be! And arguably people have leaned too hard on the decisions made in
the std as an example: in std, everything is generic so that it can be inlined
well, because std APIs underlie everything. At the application layer, this is
not obviously what you want.

However, in many cases, a well designed generic API will be able take a trait
object in the place of `T`. That way, at the application layer, you can decide
to pass trait objects around, reducing code size and - maybe more importantly
to you - compile times. cargo does this for example in some places where the
performance regression is insignificant but it saves noticeable compile time.

Sometimes libraries accidentally don't set things up so that you can pass
either a concrete `T` or a dynamically dispatched trait object to their APIs;
we're hoping to find a way of making it less likely for this to go wrong in
the future.

~~~
nicoburns
Something I've been thinking about a lot recently: would there be scope for a
compiler mode which used virtual dispatch for generics (as well as trait
objects) so as to speed up compile times? (and then you could compile the same
code monomorphised for production).

~~~
bbatha
It's been proposed before: [https://internals.rust-lang.org/t/idea-
polymorphic-baseline-...](https://internals.rust-lang.org/t/idea-polymorphic-
baseline-codegen/8313)

Its also fairly common to have code like:

    
    
       fn do_it(a: impl AsRef<i32>) {
          let a = a.as_ref();
          // do something with a
        }
    

That could be instead monomorphised to be

    
    
       fn _do_it(a: &i32) {
           // do something with a
       }
    
       fn do_it(a: impl AsRef<i32>) {
          let a = a.as_ref();
          _do_it(a)
       }
    

saving the inlining of _do_it's code bloat/generation penalty.

~~~
nicoburns
Hmm, interesting the Niko thinks that MIR optimisation passes and CraneLift
are more likely to happen soon. I so dearly want faster Rust compile times!

------
_hardwaregeek
I always find it funny when people propose adding closures to C. Because the
moment you add closures (in the way most languages implement them, not Rust's
implementation), you're adding automatic boxing, which implies garbage
collection and a runtime. Which basically means you've added objects.
Which...sure you could do, but at that point, you're not adding to C, you're
just making a new language.

~~~
saghm
Would it be possible to add C++'s implementation to C, or does it rely on
other C++ features that C doesn't already support?

~~~
whyever
C does not have `auto`, so you couldn't store closures in local variables.

~~~
v_lisivka
In C, you can use typeof().

    
    
      #define max(a,b) \
      ({ typeof (a) _a = (a); \
          typeof (b) _b = (b); \
        _a > _b ? _a : _b; })
    

[https://gcc.gnu.org/onlinedocs/gcc/Typeof.html](https://gcc.gnu.org/onlinedocs/gcc/Typeof.html)

~~~
wahern
typeof is supported by everything[1] _except_ the C standard. It was rejected
in the 1990s and nobody on the committee seems to want to revisit the
decision.

To be fair, typeof is not as useful without statement expressions (`({ ...
})`), and statement expressions aren't as widely supported.

[1] All the major compilers support it (clang, GCC, IBM, Intel, Microsoft,
Solaris, etc), plus or minus underscores in the identifier.

------
sevensor
So, if you're writing in Rust, why not box everything, most of the time? Pay
the cost of dereferencing to make it easier to reason about closures? Save
unboxed values for situations where they're either trivial to reason about or
you're sure there's a performance problem? Is this regarded as bad form?

(I don't really know Rust, but I'm watching with interest. As my comment may
suggest, I'm more familiar with C and Python.)

~~~
blt
I think the general philosophy of Rust is that programmers should think about
the data ownership semantics of their programs. Rather than boxing everything
to make it easier to reason about closures, one should strive to write
programs that use closures in a way that is easy to reason about.

------
cipherzero
As someone who has recently started learning rust i can safely say closure
issues where the hardest and most confusing things to learn how to fight with
the borrow checker. This is especially the case since futures are becoming
popular and require closures to be useful (obviously.)

This post covers a lot of the things I had to recently learn the "hard" way.
Thanks!

------
amelius
I really want to like Rust.

But writing a doubly linked list was already shown to be difficult [1], and
now closures are too ... It really feels like this is not a language to "get
things done". Please let someone prove me wrong.

[1] [https://rcoh.me/posts/rust-linked-list-basically-
impossible/](https://rcoh.me/posts/rust-linked-list-basically-impossible/)

~~~
shawn
It is decidedly _not_ a language to get things done. I say this as someone who
put himself in the unfortunate position of challenging BurntSushi (author of
ripgrep) to a coding duel, bragging that I could get LuaJIT to match his XSV
performance
([https://github.com/BurntSushi/xsv](https://github.com/BurntSushi/xsv))

It took hours to figure out how to implement this in rust:
[https://github.com/sctb/lumen/blob/master/bin/lumen](https://github.com/sctb/lumen/blob/master/bin/lumen)

If you want to see an "expert" programmer's attempt at making sense of the
language, here you go:
[https://gist.github.com/shawwn/4d7a89188cac72f3e591f47c84d3c...](https://gist.github.com/shawwn/4d7a89188cac72f3e591f47c84d3ca49)

Now that it's done though, I see why people fall in love with Rust. It's
really satisfying seeing Lumen's startup time drop from 200ms to 100ms.

But yes, like -- dude, at one point I just wanted to write a for loop like
this, to grab some environment vars:

    
    
        let mut lumen_host = "luajit";
        let mut lua_path = "";
        let mut node_path = "";
    
        for (name, value) in env::vars() {
            if name == "LUMEN_HOST" {
                lumen_host = &value;
            } else if name == "LUA_PATH" {
                lua_path = &value;
            } else if name == "NODE_PATH" {
                node_path = &value;
            }
        }
        println!("{} {} {} {}", bin, lumen_host, lua_path, node_path);
    
    

Good luck with that! I'm sure there's a way to coax Rust into please-allow-me-
to-just-compile-this, but the borrow checker is a cruel taskmaster. Whatever
the syntax is, that's not it, and spending literally an hour trying to figure
out how to store an environment variable into a separate variable doesn't feel
very productive.

That said, Cargo is amazing. It's truly a wonderful, delightful piece of
software. Cargo crates plus a dynamic language seems like a recipe for
success.

~~~
steveklabnik
Is there a reason you looped through env::vars() rather than calling env::var
on each key you wanted? That'd be much more straightforward:

    
    
      let lumen_host = env::var("LUMEN_HOST").unwrap_or(String::from("luajit"));
      let lua_path = env::var("LUA_PATH").unwrap_or(String::new());
      let node_path = env::var("NODE_PATH").unwrap_or(String::new());

~~~
shawn
_Thank you_. I spent ... more time than I'm comfortable admitting trying to
figure out how to do that.

Out of curiosity, did you spot any other antipatterns? I'm sure there are a
bunch:
[https://gist.github.com/shawwn/4d7a89188cac72f3e591f47c84d3c...](https://gist.github.com/shawwn/4d7a89188cac72f3e591f47c84d3ca49)

I poo poo the language a lot, but it's satisfying to get the hang of it.

    
    
      fn realpath(path: &str) -> &str {
          unsafe {
              return CStr::from_ptr(
                  libc::realpath(
                      path.as_ptr() as *const i8,
                      0 as *mut i8
                      )).to_str().unwrap();
          }
      }
    

Curious about that block in particular. I have no idea what I'm doing, so it'd
be somewhat shocking if that were idiomatic.

~~~
coldtea
> _Thank you. I spent ... more time than I 'm comfortable admitting trying to
> figure out how to do that._

The question is why. The relevant examples are everywhere:

[https://doc.rust-lang.org/std/env/fn.var.html](https://doc.rust-
lang.org/std/env/fn.var.html)

[https://doc.rust-lang.org/std/env/fn.vars.html](https://doc.rust-
lang.org/std/env/fn.vars.html)

[https://doc.rust-lang.org/book/second-
edition/ch12-05-workin...](https://doc.rust-lang.org/book/second-
edition/ch12-05-working-with-environment-variables.html)

[http://siciarz.net/24-days-rust-environment-
variables/](http://siciarz.net/24-days-rust-environment-variables/)

~~~
shawn
FWIW, those four links are already purple for me.

The reason it's difficult is that it simultaneously introduces the concepts of
optionals, unwrapping, string processing, variable assignment, and lifetime
management. It's not like it's impossible to figure out. I was only saying the
workload was similar to "baby's first vi session".

~~~
coldtea
That's understandable, but isn't it premature to judge Rust based on such a
session?

That would be like criticizing vi based on "baby's first vi session", because
"couldn't even exit the editor!".

~~~
shawn
I... am not sure. On one hand, it's instructive for language designers to know
what turns out to be a barrier to entry. They're often unexpected things. And
what's worse, users usually run into them and then quietly disappear; you
almost never get the chance to see what trips newbies up in practice. It
wasn't meant as a solid criticism, but more of a cathartic "yes, this language
can be _really_ counterintuitive at times." (Rust is in good company. It's the
same reason Lisp is unpopular.)

On the other hand, you're absolutely right: the hypothesis is, given two
equally-experienced programmers, one of them a Rust aficionado and the other a
LuaJIT maniac, the LuaJIT dev will deliver reliable and maintainable software
with an order of magnitude less effort. So you're right, that remains to be
seen (to put it mildly).

Looking forward to seeing how close I can get to XSV's performance. It's a
great benchmark.

------
skybrian
If you're interested in writing code for maximum readability, might it be
simpler to write for loops more often, rather than tricky type signatures?

~~~
ealhad
The problem of a for loop is actually that it's just a for loop.

When you see filter, map, or max_by, you know what the code does right away.

~~~
skybrian
Well, mileage varies there. For the most common for loops (equivalent to map
or filter), I find it easy to see what they do at a glance. For many
collection apis I have to look it up.

~~~
ealhad
I guess it's a matter of taste/habit.

------
madeuptempacct
This may be way too much for this topic, but where do closures get stored?

~~~
sam0x17
If they are "boxed" they are on the heap, otherwise they are on the stack.

~~~
int_19h
Don't they do small object optimization? In C++, implementations of
std::function usually do that to avoid allocating for most common cases.

~~~
steveklabnik
Closures don't implicitly box, so any boxing would be done by the user
directly, and so that would be up to them.

~~~
int_19h
I meant the Box type itself. In C++, std::string and std::function are usually
intentionally padded, so that when the data is small enough, it can fit inside
the object itself, rather than storing the pointer to heap-allocated storage.

~~~
steveklabnik
A Box'd Fn is two pointers: one to the code, and one to the data. The only way
that this would work is if the data was a single usize, which is not likely to
be many useful closures. In theory it could happen, I guess...

~~~
int_19h
The idea is that you intentionally pad your type with fields that don't do
anything other than provide a scratchpad for extra storage, much like over-
allocating an array to avoid resizing on append. If the stored data is
smaller, those go unused, so if you pick the wrong amount, the overhead is too
great. But if you hit the sweet spot, it has very significant perf benefits,
not only because it avoids hitting the allocator on copies, but also because
of data locality.

BTW, why wouldn't a lambda with a single usize field be common? I thought Rust
captures everything by borrow by default; if so, doesn't the implementation
simply store a frame pointer inside the lambda, and use it to access the
captured variables? It seems like an obvious optimization...

~~~
steveklabnik
Yes, I understand SSO. It’s just that these structures are already smaller
than String/Vec would be by a third, so it’s harder to find something that
fits.

> if so, doesn't the implementation simply store a frame pointer inside the
> lambda,

Yes. That’s what I was talking about. That’s why you’d only be able to store
one usize inline. You’d be replacing that single pointer with data. String and
Vec have double the available space!

(Incidentally, String can’t do SSO in Rust due to it exposing the underlying
storage; there are SSO strings available in a package if it’s significant in
your use-case.)

~~~
int_19h
But if a lambda capture is already just a single pointer in this case, what
exactly is on the heap when it's boxed? Like you said, it's one word for
pointer to code, and one word for data. Why can't Box accommodate both
directly within itself without heap-allocating? Is it the desire to have
sizeof Box be a single word? But then why not offer SSO-enabled Box, or maybe
even specifically FnBox, in a separate package, same as with strings?

~~~
steveklabnik
Okay. let's talk about Box.

Box<T> is always a single usize if T is Sized, and two usizes if T is not. T
is not sized when it's a trait. If T is Sized, then Box<T> puts the T on the
heap, and the Box itself is a pointer to it.

Box<Fn()> is a "trait object" the double pointer. It looks like this:

    
    
      pub struct TraitObject {
          pub data: *mut (),
          pub vtable: *mut (),
      }
    

The first pointer is a pointer to the data, and the second pointer is a
pointer to a vtable. So in the Box<Fn()> case, data is the captures, and the
vtable is the code.

So when you write code like this:

    
    
      // in a function body somewhere
      let x = 5;
      let f = Box::new(|| x);
    
    

Rust generates code that looks roughly like this:

    
    
      // at item level
      struct Env<'a> {
          x: &'a i32,
      }
    
      // an impl of Fn for Env, i'm not writing that out
    
      // in a function body somewhere
      let x = 5;
      let env = Env { x: &x };
      let memory = GlobalAlloc::alloc(..);
    
      // copy env into memory, i'm not writing that out
    
      let f = TraitObject {
          data: memory,
          vtable: pointer_to_vtable_im_not_writing_out,
      }
    

make sense? The Env is on the heap.

Before I move on, It's worth noting that, in the case of a closure which has
no captures:

    
    
      let f = Box::new(|| 5);
    

you get:

    
    
      // at item level
      struct Env;
    
      // an impl of Fn for Env, i'm not writing that out
    
      // in a function body somewhere
      let x = 5;
      let env = Env;
      let memory = GlobalAlloc::alloc(..);
    
      // copy env into memory, i'm not writing that out
    
      let f = TraitObject {
          data: memory,
          vtable: pointer_to_vtable_im_not_writing_out,
      }
    

since Env is a unit struct, it has zero size, and so gets optimized out at
compile time, which also ends up removing all of the data stuff, which means
it's back to a regular function pointer (fn instead of Fn).

Anyway.

So, if we were to do an SSO-like optimization here, we only have a single
usize, the data pointer, to work with. We still need the function pointer, so
we can't re-purpose it.

In theory, that would mean that an environment that has a total size of less
than usize _could_ be stored inline. It's not clear to me how often this is
true, and if it would be worth it. But none of these internal representations
are stable, so it's possible for Box<Fn()> at least.

Compare this to std::string. Different compilers implement it differently, but
let's look at gcc:

[https://github.com/gcc-
mirror/gcc/blob/master/libstdc%2B%2B-...](https://github.com/gcc-
mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/string#L64)

[https://github.com/gcc-
mirror/gcc/blob/master/libstdc%2B%2B-...](https://github.com/gcc-
mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/basic_string.h#L156-L165)

You've got a pointer to the data, the size, and a union of the buffer vs the
capacity. In the end, on a 64 bit system, this is 32 total bytes of space!
Box<Fn()> by contrast, is 16 bytes total, and we can only use 8 bytes of that.
Additionally, strings often do have individual byte data, but closures, given
that they capture by default, have a size of at least 8. _maybe_ a move
closure that has a bunch of small numbers in it might make a good fit for some
sort of SSO-style optimization here, but environments tend to be bigger and
space is smaller than strings.

Does that all make sense?

~~~
int_19h
Not quite - that still doesn't explain why there can't be an SSO-optimized
FnBox with extra padding that's there specifically to fit the kind of Fn that
only needs to store the frame pointer, inline on the stack.

In fact, now I'm even more confused as to why it can't be done generically for
Box<Fn>:

> The first pointer is a pointer to the data, and the second pointer is a
> pointer to a vtable. So in the Box<Fn()> case, data is the captures, and the
> vtable is the code.

If a given lambda can capture via a single frame pointer (because it borrows
everything), why can't that frame pointer just be stored directly in the
`data` field? Why does it have to indirected, with data pointing to another
pointer on the heap?

And if borrow-capturing lambdas don't actually capture via frame pointer, why
not?

Or are you saying that lambdas that borrow everything aren't actually all that
common? But if they aren't, then why does capture by borrowing reference the
default for lambdas?

~~~
steveklabnik
> that still doesn't explain why there can't

There can, you'd have to write it yourself.

> If a given lambda can capture via a single frame pointer (because it borrows
> everything), why can't that frame pointer just be stored directly in the
> `data` field?

If there's a single thing it's borrowing, then yes, it could. If there's any
more than one, it won't fit.

> Or are you saying that lambdas that borrow everything aren't actually all
> that common?

I'm saying that closures that borrow _exactly one_ thing aren't actually all
that common.

~~~
int_19h
I think you're misunderstanding what I mean by "frame pointer".

What I'm saying is that if a lambda is capturing local variables from a single
stack frame by borrow (which ensures that it can't outlive them), then it
doesn't need individual pointers to every variable. It just needs a pointer to
the stack frame where they live, and it can access the individual variables by
offset, same as the function to which that frame belongs does. It's compiler
magic, of course - i.e. it means that such lambdas don't have a
straightforward translation to a regular Rust object. But it's magic that
behaves _as if_ you had individual references, so it's a completely
transparent optimization except for size of the resulting object.

On Intel architectures, you can think of it as stashing away EBP/RBP.

And lambdas that only capture locals from one frame, and all of them by
reference, seems like a very common kind of lambda to me, making that
optimization worthwhile.

~~~
steveklabnik
Ah! Yes, I see now. I think that D implements closures this way? I thought you
meant the closure’s conceptual stack frame.

As for commonality, the most common closures I see are:

* closures which have no environment

* closures that take the environment by move

* everything else

It’d be interesting to analyze crates.io and see though...and how much this
would actually save you. I can't think of a time when the closure is only
pointing to its parent's stack frame and you _would_ box it. Box<Fn()> is
mostly used for returning closures, and there, you'd be using a move closure
to take the env by owner.

And even if you do unnecessarily box: [https://play.rust-
lang.org/?gist=c16ac5983579d52d4997b3f4425...](https://play.rust-
lang.org/?gist=c16ac5983579d52d4997b3f44258748b&version=stable&mode=release&edition=2015)

If you click "show assembly", do_closure_stuff is compiled to:

    
    
      playground::do_closure_stuff:
    	  leal	2(%rdi), %eax
    	  retq
    

It gets entirely optimized away anyway.

------
13415
Everything in Rust is somewhat hard.

~~~
mmirate
... to get an initial implementation of. It's much easier to read, debug and
refactor Rust code than in almost any other language that offers similar
resource-efficiency.

~~~
13415
Ada is way easier to read, debug, maintain, and refactor than Rust. So is the
majority of other halfway safe languages (except Haskell, maybe).

~~~
mmirate
Halfway-safe languages (including Haskell) that aren't Rust, have a nasty
habit of eating memory (and usually CPU time, too) for breakfast, compared to
equivalent programs in C, C++ and Rust. That's what I meant by "resource
efficiency".

