> Rust doesn't compile that way -- you can't compile individual modules at once,...

ahicks · on April 16, 2017

There are two specific cases here where the layout is not obvious.

The first is the null-pointer optimization (I think this is the official name but I swear I question myself every time I mention it), in which we use knowledge that an inner struct contains a reference to avoid enum discriminants. that is, Option<i32> will have an extra field up front saying if it's None or Some, but Option<&i32> will just encode None as the null pointer because references can't be null. This also optimizes something like Result<&i32, ()>. The net result is that a lot of stuff that looks expensive is basically free. There has been discussion of extending this to use multiple pointers so that we can hit more complicated enums like Option<Option<(&i32, &i32)>>, but this has thus far not happened.

The second is enums themselves. The discriminant algorithm is not obvious. If you want a discriminant of a specific size, you can pick it with a repr. But otherwise it's implementation defined.

And there is one third thing we have discussed doing but haven't yet. If you have a bunch of enums nested inside each other, having multiple discriminants is a waste. There is no reason the compiler can't just collapse them down into 1 in a lot (but not all) cases.

For anyone who wants to know the specific algorithm for all of this, it's now all in one place: src/librustc/ty/layout.rs

dbaupp · on April 16, 2017

> I'm not sure how Rust crates (or other compilation units) map to those - my guess would be that a given crate will be compiled into (in win32 terms) a .exe .lib or .dll

You're correct.

> but to me this still feels more like a simple-vs-easy trade off [1] than a strict win.

If you're meaning the easy side is stopping people having to reorder fields themselves, it's more than that: generics plus C++-style monomorphisation/specialisation mean there are cases when it's impossible for the definition of the type to choose the right order. For instance: given struct CountedPair<A,B> { x: u32, a: A, b: B }, all three of CountedPair<u64, u64>, CountedPair<u64, u8> and CountedPair<u16, u8> need different orders.

Manishearth · on April 16, 2017

> I think we have a terminology mixup.

Not really -- my core point was that C++ compilation units are usually smaller than Rust.

Most C++ codebases I've dealt with will be of the kind where there's a single stage where all the cpp files get compiled one by one. Not a step by step process where one "module" gets compiled followed by its dependencies.

For these codebases, you have a huge win if you can touch a header file and only cause a small set of things to be recompiled. For Rust codebases, it's already a large compilation unit, so you're usually already paying that cost (and with incremental compilation the compiler can reduce that cost, but smartly, so you get a sweet spot where you're not compiling too much but are not missing anything either).

But yes, being able to skip compilation of downstream crates would be nice.

(You're right that a crate is compiled into a .exe or .so or whatever)

> Pardon my Rust ignorance, but is this scenario significantly different from C++ templates? The layout of a (judiciously) templated C++ class may not be "immediately obvious" but in practice it's often still very straightforward to infer.

ADTs are tagged unions. There's a tag, but it can sometimes be optimized out and hidden away elsewhere.

You can mentally unravel templates to figure them out. Enums are a whole new kind of layout that you need to understand.