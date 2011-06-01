A few thoughts from a C/C++ perspective (I have only tinkered briefly with Rust):
- This effectively makes the optimization algorithm part of the Rust ABI; you cannot change (or even fix a bug in) the layout algorithm without breaking binary compatibility. The simpler the algorithm, the less of a concern this would be.
- In a systems programming context, you don't always want to optimize a struct for size. You may reorder a large struct to optimize for other properties like cache coherency, or because you have to match a memory layout specified by other software or even hardware. Obviously one can still use 'repr(C)' for these cases.
- Related to the prior point, it's a little disconcerting that adding a field to the end of a struct declaration can completely change the layout. Sometimes in C/C++ land programmers 'cheat' by adding fields to the end of a struct that's used in multiple modules, and initially only recompile the modules that need to know about the new field. (For purposes of compilation expediency/iteration speed.)
- This seems like it may violate the principle of least-surprise pretty badly. As long as we continue to use debuggers and have crash dumps, systems programmers will sometimes need to look at raw memory data and figure out what the structure is. If the field layout isn't easily predicted, that may be very difficult. In my experience there can be huge benefits to minimizing the number of opaque/hard-to-predict transformations between authored data and its final representation. (In this case, the struct declaration and its final memory layout.)
For these reasons it seems like this auto-reorder behavior might be better as an opt-in rather than opt-out behavior (which it sounds like it is.) Regardless, it's still a feature I wish I could have in C++ from time to time!
And yet they claim it's a systems programming language. While I can see the value in doing this, I can see the value in doing it manually. Another criteria can be putting the most frequently used fields together, so accessing one causes the others to load in the same cache line. In many programs the size of the structure is not important, it's the cache behavior that is and that is dependent on a lot more than the size of the elements alone. It's probably important that it can be disabled with rear(C).
C++ also doesn't guarantee a stable ABI, and even the C ABI is only de facto standardized on various platforms, nothing is provided by the language standard itself.
Note that "platform" means e.g. Linux-x86-64 and not Linux. Linux-ppc has different rules, for example.
Rust absolutely could be a systems programming language, but it will need at least repr(C) and such options to interface with: a) hardware, b) code not written in Rust or compiled with a different version of the Rust compiler. Sometimes (b) means "with itself", e.g., via structs saved in files with no additional encoding.
There would be lots of problems if we tried to standardize the ABI; we would be forever locked into all sorts of decisions that would be impossible to change. For instance, we'd be unable to fix bugs in much of the standard library. This why, for example, C++ does not standardize the ABI.
> You may reorder a large struct to optimize for other properties like cache coherency
Optimizing for size is much more commonly desired than manually cache-aligning things. Optimizing for manual cache alignment would be the wrong default.
> or because you have to match a memory layout specified by other software or even hardware. Obviously one can still use 'repr(C)' for these cases.
In fact you always had to use `repr(C)` for those cases. It's just that it wasn't enforced before.
> Sometimes in C/C++ land programmers 'cheat' by adding fields to the end of a struct that's used in multiple modules, and initially only recompile the modules that need to know about the new field.
This only works if you throw type/memory safety out the window. In a language like Rust that tries to encourage memory safety, that wouldn't make sense.
> - This seems like it may violate the principle of least-surprise pretty badly. As long as we continue to use debuggers and have crash dumps, systems programmers will sometimes need to look at raw memory data and figure out what the structure is.
Rust data type layout has always been non-obvious in various ways, because of generics, zero-sized types, the various enum optimizations, etc. Not to mention that LLVM can already split apart structures on the stack in C, C++, and Rust.
> For these reasons it seems like this auto-reorder behavior might be better as an opt-in rather than opt-out behavior (which it sounds like it is.)
In C++, maybe. In Rust, it's a different story, and I'm confident we made the right decision.
To be clear, this "enforcing" is now the fact that your fields, if you forgot #[repr(C)], might be in a different order than what you assume.
But it has been linted for a very long time, at least in the direct interactions with FFI, you'd have to be casting pointers to/from C to hide that you're using a Rust struct for C data, from that lint.
Rust doesn't compile that way -- you can't compile individual modules at once, only the entire crate. Incremental compilation is supposed to help detect intra-crate dependencies so that you don't have to compile as much, but this trick wouldn't be applicable in the incremental compilation cases too since you can't manually drive that either.
The nature of crates means that you can rely more on inlining happening and a whole host of other optimizations. Additionally, optimizations like this make more sense when the compilation unit boundary is the crate.
> As long as we continue to use debuggers and have crash dumps, systems programmers will sometimes need to look at raw memory data and figure out what the structure is.
As long as you're using a debugger, it's fine -- we still give the correct debuginfo.
The presence of ADTs in Rust mean that the layout of many types isn't immediately obvious without debuginfo anyway.
Of course you may be in a situation where you can't rely on the debuginfo (stripped binary or something?), in which case this will be annoying. But it's really a similar situation as you have with inlining when you don't have debuginfo.
I think we have a terminology mixup. I was using 'module' in the win32 LoadModule() sense: a shared dynamically loaded library (ie. a .DLL in windows or .SO in linux.) I'm not sure how Rust crates (or other compilation units) map to those - my guess would be that a given crate will be compiled into (in win32 terms) a .exe .lib or .dll
I /think/ the Rust equivalent of the case I'm describing would be that you have a struct that's part of the public API of a crate, and it's being used across multiple crates in a large project where you don't want to fully recompile the world in order to test your changes.
> Of course you may be in a situation where you can't rely on the debuginfo (stripped binary or something?), in which case this will be annoying. But it's really a similar situation as you have with inlining when you don't have debuginfo.
In my C++ experience there end up being plenty of cases where it's really useful to be able to inspect raw memory (ie. hex dump, with no debugger or without enough context for the debugger to help you) and figure out what was going on. Obviously Rust is designed to dramatically reduce the frequency of that kind of debugging, but to me this still feels more like a simple-vs-easy trade off [1] than a strict win.
> The presence of ADTs in Rust mean that the layout of many types isn't immediately obvious without debuginfo anyway.
Pardon my Rust ignorance, but is this scenario significantly different from C++ templates? The layout of a (judiciously) templated C++ class may not be "immediately obvious" but in practice it's often still very straightforward to infer.
[1] https://www.infoq.com/presentations/Simple-Made-Easy
The first is the null-pointer optimization (I think this is the official name but I swear I question myself every time I mention it), in which we use knowledge that an inner struct contains a reference to avoid enum discriminants. that is, Option<i32> will have an extra field up front saying if it's None or Some, but Option<&i32> will just encode None as the null pointer because references can't be null. This also optimizes something like Result<&i32, ()>. The net result is that a lot of stuff that looks expensive is basically free. There has been discussion of extending this to use multiple pointers so that we can hit more complicated enums like Option<Option<(&i32, &i32)>>, but this has thus far not happened.
The second is enums themselves. The discriminant algorithm is not obvious. If you want a discriminant of a specific size, you can pick it with a repr. But otherwise it's implementation defined.
And there is one third thing we have discussed doing but haven't yet. If you have a bunch of enums nested inside each other, having multiple discriminants is a waste. There is no reason the compiler can't just collapse them down into 1 in a lot (but not all) cases.
For anyone who wants to know the specific algorithm for all of this, it's now all in one place: src/librustc/ty/layout.rs
You're correct.
> but to me this still feels more like a simple-vs-easy trade off [1] than a strict win.
If you're meaning the easy side is stopping people having to reorder fields themselves, it's more than that: generics plus C++-style monomorphisation/specialisation mean there are cases when it's impossible for the definition of the type to choose the right order. For instance: given struct CountedPair<A,B> { x: u32, a: A, b: B }, all three of CountedPair<u64, u64>, CountedPair<u64, u8> and CountedPair<u16, u8> need different orders.
Not really -- my core point was that C++ compilation units are usually smaller than Rust.
Most C++ codebases I've dealt with will be of the kind where there's a single stage where all the cpp files get compiled one by one. Not a step by step process where one "module" gets compiled followed by its dependencies.
For these codebases, you have a huge win if you can touch a header file and only cause a small set of things to be recompiled. For Rust codebases, it's already a large compilation unit, so you're usually already paying that cost (and with incremental compilation the compiler can reduce that cost, but smartly, so you get a sweet spot where you're not compiling too much but are not missing anything either).
But yes, being able to skip compilation of downstream crates would be nice.
(You're right that a crate is compiled into a .exe or .so or whatever)
> Pardon my Rust ignorance, but is this scenario significantly different from C++ templates? The layout of a (judiciously) templated C++ class may not be "immediately obvious" but in practice it's often still very straightforward to infer.
ADTs are tagged unions. There's a tag, but it can sometimes be optimized out and hidden away elsewhere.
You can mentally unravel templates to figure them out. Enums are a whole new kind of layout that you need to understand.
I think the official word is that Rust has no stable ABI, unless you use repr(C) to force the C ABI?
How would that work, with some modules assuming an incorrect size for a given type?
I agree with your concerns regarding debuggers and crash dumps, though in my experience you need debuggers far less often when writing Rust code, and when you do need them (e.g. for unsafe code) you've often already used `repr(C)` on the types of interest.
If they only consume pointers to that type (ie. never allocate it themselves, deal with arrays, etc.) and their notion of the memory layout is a prefix-identical subset of the 'real' type, then they can't tell the difference. And sometimes that's enough to get by while you test a new feature! A similar trick is initially adding new virtual functions to the very end of a C++ class declaration (regardless of where in the declaration they 'should' be for readability etc.) so the vtable is prefix-compatible with other modules that you haven't recompiled yet. If you work in a large C++ codebase with lots of DLLs, it's a dirty trick that can come in very handy.
The memory prefix trick is an ad-hoc version of the more 'legitimate' approach of using inheritance to expose only part of the memory layout of a public API:
struct PublicStruct { /* ... */ }; // shared as part of the API details
struct InternalImpl: public PublicStruct { /* ... */ };
This is also related to other memory tricks like storing a null-terminated string's length (or hash) in the memory before the address of its first character, or how most malloc implementations have a per-allocation bookkeeping header that lives in the memory immediately preceding the address returned to their caller. There are countless cases in systems programming where programmers do sneaky things with memory (for better or worse...)
I disagree with the "it's already repr(C)" bit. While for Stylo most of my debugger usage has been on C++ or FFI code, when working on Servo proper I've had to debug plenty of pure Rust Servo code, and have rarely dipped into repr(C) code.
I do agree that with Rust I've had to use a debugger less often overall.
And it's not exactly like C struct packing rules are particularly burdensome cognitively. They're not. The only real question is whether packing should be the default (Rust) or optional (e.g., Pascal).
Suppose you're just writing a struct to a file, which you'll read back later. It can't be the case that later you might fail because you got recompiled with an updated struct packing algorithm in the compiler. Sure, you might say "use an encoder", but if you don't need to be concerned with byte ordering, for example, then writing a struct as-is is the simplest and fastest encoding, and this is a totally legitimate design.
Any time you need stability, you need something like C struct packing rules.
IMO C struct packing should be the default, as that is safer. Though I can live with it being optional.
It sounds like any time you need interoperability you need C strict packing rules. That interoperability might be with the filesystem, but it's still interaction with non-rust.
I think what's needed is a set of routines to reorganize data to and from rust and C representations, with passing an already converted type as a non-op. That gives people the opportunity to choose repr(C) at development if they need to interop with C or want to optimize for cache access, or to convert to-and-from on the fly if they just want to store to disk in a stable format occasionally but still want to reap the benefits of lower memory usage in normal operation. That seems like it would be the best of all worlds.
Oh, and a core library with struct packing routines for every version would be very useful to get out of those tight spots where some dev accidentally stored rust packed data to disk and found out about their problem after the fact. Being able to call RustRepr::rust1_16_to_C() or RustRepr::rust1_16_to_rust() (for current algorithm) in Rust 1.45 when the algorithm changed in 1.42 would be really useful...
Nonetheless, other things which aren't stable include the discriminant values of an enum and whether/if we use the null pointer optimizations. And probably others. Doing this to any Rust type is not safe as-is, and guaranteeing an ABI for all types just freezes the algorithm, as I've said elsewhere in this thread. SO I'm not seeing how this is a particular problem.
Very impressive effort to work on something as technical as optimising a Rust compiler.
And to write about it so eloquently just makes it even more impressive.
[1]: https://www.gofundme.com/fund-libaudioverse-development
I wonder how many people have actually read through such a huge amount of specialized source code. I know I haven't, and I don't need a screen reader to do so! Now I actually feel quite ashamed.
Mad props to you, Austin! Keep up the outstanding work!
P.S. Mozilla, hire this guy.
Man I suck as a programmer; I complain about every little thing and barely ever program.
I make the effort to match formatting but I hate it because I get basically no benefit, and the early reviews here had lots of places where eddyb made me change it until I finally learned all the rules for this project. A standard formatting tool and official list of rules for Rust is in the works. That'll be nice to have.
People have started going down the blind programming language road; look up Quorum. What's not obvious about it is that mostly it's used at schools for the blind.
But I think it's incredibly stupid. I'm a very good C++ programmer. I'm at least reasonable at Python. I know enough Haskell that I'm no longer actively frightened of the monad (i.e. a newbie, but past the biggest hurdle). Obviously I both know and actively like Rust.
You can make a programming language for the blind. You can't make people use it. if we go down that path, we end up in a world where blind programmers work for the blind programming shop because the rehab agencies push you there. And then you make less money.
I know at least 10 other blind programmers. 3 of them either work or have worked at Microsoft. 2 of them maintain my screen reader, which is something around 40000 lines of C++ and Python doing incredibly complicated things to fully hook into the OS. One of them is either currently at or has worked at Google; I'm not in touch with him anymore, so I don't know if he moved on or not.
We don't need programming languages for the blind. We need people to stop tying technologies closely to IDEs and then not making said IDEs accessible.
My other project, Libaudioverse, is cross platform. The irony is that this is because VS was inaccessible. I used CMake instead and learned cdb when I needed a debugger. VS is at the point of dreadfully inconvenient but workable now, but I still don't use the Microsoft stack. It puts me in the position of being at their mercy--if they roll out a new UI tomorrow and it's inaccessible (as they did with 2010), all my personal projects stop instantly, and I maybe lose my job when my employer upgrades it, depending how bad it is. Microsoft is one of the better places for accessibility; there are issues with VS for screen reader users that have been around and reported for years.
BTW: Rust team now runs unit tests of packages on crates.io, so each compiler release is verified against almost all public Rust code.
Ah, giving CPAN Testers[1] some competition. :)
If you haven't already got plans for it (or already done it!), I highly suggest starting an initiative to get lots of donated resources so you can really fill out a testing matrix and make it public[2]. Perl's CPAN a good place to look for what works and what they would do differently (or what they have planned), as they have an amazingly robust module testing ecosystem.
1: http://www.cpantesters.org/
2: http://matrix.cpantesters.org/?dist=Path-Tiny+0.104
nonetheless, the idea isn't original to us.
0: http://blog.ezyang.com/2011/06/debugging-compilers-with-opti...
This property also ensures a (hypothetical) type like `CachePadded<T>`, which keeps a value of type T in its own cache line, works as expected.
You always had to, the compiler explicitly never guaranteed layout for repr(rust), it just didn't perform any optimisation yet before TFA's changes.
Pascal had "packed" as an option for structures and arrays. Using "packed array of boolean" got you a bit array. Rust has "repr(packed)", but it was broken as of Rust 1.0. Something that has no machine address, such as an individual bit, is apparently troublesome. C supports bit fields in structs, but that feature is seldom used.
It always has, we lint for this, and the Rust struct layout has been officially left unspecified (making assumptions based on it UB) from before 1.0, anyway.
> One would think that would be 'repr("C")', since here, "C" is neither a reserved word nor a variable, but, whatever.
Neither is 'repr' - they're both just identifiers in an attribute. #[attr("string literal")] is newer and still unstable (as part of the new macro system).
The bit array packing is something we've wanted to do for a long while, but it requires an opt-in along the lines of "disallow taking references to any (sub-byte) fields".
#[repr(packed)] is the same as the C equivalent, be it attribute or pragma, in that it only removes alignment padding and does nothing to booleans.
Every micro controller I've ever worked with has header files defining bit fields for hardware registers. It's a matter of what level you're writing for. I would expect Linux to contain tons of these.
On top of that, on nightly you have this flag:
-Z print-type-sizes -- print layout information for each type encountered
This is what the output looks like: https://github.com/rust-lang/rust/blob/13fd5e93deb41045c4de8...
"the Mozilla Rust team awards contracts, grants and internships more directly. We haven't tended to make a lot of fanfare around these in the past, but the general strategy has been to support people who have been doing amazing volunteer work so that they can do that work full time for a stretch. That's covered work from eddyb on the compiler, Integer 32 on crates.io and Rust training, dtolnay on Serde and other library work, jseyfried on the new macro system, carllerche on mio and Tokio, and withoutboats on a variety of topics -- and that's all this year! Beyond that, we have academic grants for research into Rust for HPC (really large-scale servers/clusters), and foundational work around the unsafe guidelines/formal semantics for Rust."
"In general we have a preference toward supporting people who are already doing great work in the ecosystem, rather than continually expanding the internal Mozilla team. We want Rust to grow and find support far beyond Mozilla, and this is one way of nudging things in that direction, while bolstering the ecosystem at the same time."
"While we've awarded almost all of our contracting money for this year, please feel free to reach out to me at aturon@mozilla.com if you have interest in small (~3 months) contract work to help push a piece of the Rust ecosystem (or tooling, or docs, or ...) over the finish line."
I don't want to leave the Rust ecosystem, and I'd kill to keep working in it. But I didn't realize that there was any source of general funding from Mozilla specifically targeted at Rust. This whole thing was done simply because I'm bored and had a lot of free time and also we thought it would only take a few weeks. Money was far from the motivation initially, but certainly it occurred to me later on that it's amazing for the resume.
https://github.com/rust-lang/rust/blob/master/src/libstd/col...
It would be cool to see if the "we'll manage our own pointers, thanks" approach makes less sense with this optimization in place. It would be nice; the source is pretty inscrutable, mostly in the interest of performance, I think.
Edit: Or, maybe more importantly: in the untyped memory they store hashes and (K,V) pairs separately, so I think the only opportunity is to swap (K,V) for (V,K).
Rustc special cases them in a lot of places to do things that I will not even pretend to understand, and not reordering 2-element structs keeps them working. Since this already took 6 months of my free time (I have a lot of free time, though not 40 hours a week admittedly), this was for the best.
You can make an argument that we should have, but it doesn't matter from the user's perspective. (a, b) is the same size as (b, a) no matter what a and b are.
It seems like a great way to boost performance, but performance isn't everything. Compiler simplicity should also be a goal (though perhaps difficult with rust, I think it's worth some effort). Debuggability should be a goal.
And I think some kind of ABI guarantees would be nice eventually after the rest of the language stabilizes.
There's also a helper function that lets you invert it. The part of the compiler that builds LLVM types just uses this.
The cost here isn't complexity, it's that "Fields are in increasing order by offset" has been baked into the compiler for years and you have to make sure to change them all.
A few thoughts from a C/C++ perspective (I have only tinkered briefly with Rust):
- This effectively makes the optimization algorithm part of the Rust ABI; you cannot change (or even fix a bug in) the layout algorithm without breaking binary compatibility. The simpler the algorithm, the less of a concern this would be.
- In a systems programming context, you don't always want to optimize a struct for size. You may reorder a large struct to optimize for other properties like cache coherency, or because you have to match a memory layout specified by other software or even hardware. Obviously one can still use 'repr(C)' for these cases.
- Related to the prior point, it's a little disconcerting that adding a field to the end of a struct declaration can completely change the layout. Sometimes in C/C++ land programmers 'cheat' by adding fields to the end of a struct that's used in multiple modules, and initially only recompile the modules that need to know about the new field. (For purposes of compilation expediency/iteration speed.)
- This seems like it may violate the principle of least-surprise pretty badly. As long as we continue to use debuggers and have crash dumps, systems programmers will sometimes need to look at raw memory data and figure out what the structure is. If the field layout isn't easily predicted, that may be very difficult. In my experience there can be huge benefits to minimizing the number of opaque/hard-to-predict transformations between authored data and its final representation. (In this case, the struct declaration and its final memory layout.)
For these reasons it seems like this auto-reorder behavior might be better as an opt-in rather than opt-out behavior (which it sounds like it is.) Regardless, it's still a feature I wish I could have in C++ from time to time!