
Optimizing Rust Struct Size - jscholes
http://camlorn.net/posts/April%202017/rust-struct-field-reordering.html
======
mockery
This is pretty cool!

A few thoughts from a C/C++ perspective (I have only tinkered briefly with
Rust):

\- This effectively makes the optimization algorithm part of the Rust ABI; you
cannot change (or even fix a bug in) the layout algorithm without breaking
binary compatibility. The simpler the algorithm, the less of a concern this
would be.

\- In a systems programming context, you don't always want to optimize a
struct for size. You may reorder a large struct to optimize for other
properties like cache coherency, or because you have to match a memory layout
specified by other software or even hardware. Obviously one can still use
'repr(C)' for these cases.

\- Related to the prior point, it's a little disconcerting that adding a field
to the end of a struct declaration can completely change the layout. Sometimes
in C/C++ land programmers 'cheat' by adding fields to the end of a struct
that's used in multiple modules, and initially only recompile the modules that
need to know about the new field. (For purposes of compilation
expediency/iteration speed.)

\- This seems like it may violate the principle of least-surprise pretty
badly. As long as we continue to use debuggers and have crash dumps, systems
programmers will sometimes need to look at raw memory data and figure out what
the structure is. If the field layout isn't easily predicted, that may be very
difficult. In my experience there can be huge benefits to minimizing the
number of opaque/hard-to-predict transformations between authored data and its
final representation. (In this case, the struct declaration and its final
memory layout.)

For these reasons it seems like this auto-reorder behavior might be better as
an opt-in rather than opt-out behavior (which it sounds like it is.)
Regardless, it's still a feature I wish I could have in C++ from time to time!

~~~
kibwen
_> Sometimes in C/C++ land programmers 'cheat' by adding fields to the end of
a struct that's used in multiple modules, and initially only recompile the
modules that need to know about the new field. (For purposes of compilation
expediency/iteration speed.)_

How would that work, with some modules assuming an incorrect size for a given
type?

I agree with your concerns regarding debuggers and crash dumps, though in my
experience you need debuggers far less often when writing Rust code, and when
you do need them (e.g. for unsafe code) you've often already used `repr(C)` on
the types of interest.

~~~
eddyb
The correct type layout is already encoded in debuginfo, so I'm not sure how
that's a problem unless you're avoiding a debugger intentionally or simply
can't use one?

~~~
kibwen
Do we generate debuginfo by default at -O? If so, then TIL. :)

~~~
eddyb
Not by default, no, but what I mean that if you're debugging you can just
enable debuginfo and _that_ will contain the descriptions of the types
including all fields, with the right offsets and everything.

------
threeseed
For those that didn't read the whole thing. The author is blind !

Very impressive effort to work on something as technical as optimising a Rust
compiler.

And to write about it so eloquently just makes it even more impressive.

~~~
vanderZwan
I wonder if the rise of standardised formatting for source code helps here?

~~~
CAMLORN
Depends on the standard. there are parts of pep8 that are nearly impossible
for me to follow, namely the bits about indenting complex expressions that
span multiple lines. I mostly don't care about it, save for indenting by fixed
amounts at every {. My screen reader can announce changes in indentation
level, so it's a way to not have to remember if we're 3 or 10 braces in.

I make the effort to match formatting but I hate it because I get basically no
benefit, and the early reviews here had lots of places where eddyb made me
change it until I finally learned all the rules for this project. A standard
formatting tool and official list of rules for Rust is in the works. That'll
be nice to have.

~~~
vanderZwan
Thank you for your reply, I never would have thought of indentation being
useful. It would be interesting to see what a programming language designed to
be optimal for conveying information through screen readers would turn into. I
wouldn't be surprised it it would lead to something with very nice formatting
for people who don't need such tools as well.

~~~
CAMLORN
This is admittedly a bit ranty, but it hits some important points that a lot
of sighted people miss, so I'm posting it anyway.

People have started going down the blind programming language road; look up
Quorum. What's not obvious about it is that mostly it's used at schools for
the blind.

But I think it's incredibly stupid. I'm a very good C++ programmer. I'm at
least reasonable at Python. I know enough Haskell that I'm no longer actively
frightened of the monad (i.e. a newbie, but past the biggest hurdle).
Obviously I both know and actively like Rust.

You can make a programming language for the blind. You can't make people use
it. if we go down that path, we end up in a world where blind programmers work
for the blind programming shop because the rehab agencies push you there. And
then you make less money.

I know at least 10 other blind programmers. 3 of them either work or have
worked at Microsoft. 2 of them maintain my screen reader, which is something
around 40000 lines of C++ and Python doing incredibly complicated things to
fully hook into the OS. One of them is either currently at or has worked at
Google; I'm not in touch with him anymore, so I don't know if he moved on or
not.

We don't need programming languages for the blind. We need people to stop
tying technologies closely to IDEs and then not making said IDEs accessible.

My other project, Libaudioverse, is cross platform. The irony is that this is
because VS was inaccessible. I used CMake instead and learned cdb when I
needed a debugger. VS is at the point of dreadfully inconvenient but workable
now, but I still don't use the Microsoft stack. It puts me in the position of
being at their mercy--if they roll out a new UI tomorrow and it's inaccessible
(as they did with 2010), all my personal projects stop instantly, and I maybe
lose my job when my employer upgrades it, depending how bad it is. Microsoft
is one of the better places for accessibility; there are issues with VS for
screen reader users that have been around and reported for years.

------
pornel
That's a great work and great write-up. I'm happy it finally landed.

BTW: Rust team now runs unit tests of packages on crates.io, so each compiler
release is verified against almost all public Rust code.

~~~
kibwen
To elaborate: for a long time (since before 1.0) Rust has had a tool called
Crater, which attempts to compile every library on crates.io using two
different versions of the compiler, to see if the newer version of the
compiler has caused any regressions. What's happened recently is that Crater
has been superceded by a new tool, called Cargobomb, which not only compiles
public Rust code, but also attempts to run test suites (among other things,
see
[https://github.com/brson/cargobomb/blob/master/README.md](https://github.com/brson/cargobomb/blob/master/README.md)).
What's great about this new approach is that it can find regressions that
wouldn't manifest as compilation errors, such as the field reordering changes
described in the OP.

~~~
kbenson
> which not only compiles public Rust code, but also attempts to run test
> suites

Ah, giving CPAN Testers[1] some competition. :)

If you haven't already got plans for it (or already done it!), I highly
suggest starting an initiative to get lots of donated resources so you can
really fill out a testing matrix and make it public[2]. Perl's CPAN a good
place to look for what works and what they would do differently (or what they
have planned), as they have an _amazingly_ robust module testing ecosystem.

1: [http://www.cpantesters.org/](http://www.cpantesters.org/)

2: [http://matrix.cpantesters.org/?dist=Path-
Tiny+0.104](http://matrix.cpantesters.org/?dist=Path-Tiny+0.104)

------
kibwen
Using fuel to enable a binary search to uncover misapplied optimizations is
very clever! I always just figured that fuel was used to avoid pathological
behavior in the optimizer by putting an upper bound on the time spent
optimizing.

~~~
CAMLORN
The first time I think I saw the idea was [0], but I don't know if this is the
post that got linked in the discussion threads. Rust didn't have fuel at all
until after my changes, and I don't think it's necessary to use it as a limit
unless you're letting the optimizer run on the optimizer's output until it
stops changing. The number of optimizations Rustc can run is already finite
because it doesn't do so.

nonetheless, the idea isn't original to us.

0: [http://blog.ezyang.com/2011/06/debugging-compilers-with-
opti...](http://blog.ezyang.com/2011/06/debugging-compilers-with-optimization-
fuel/)

------
glandium
While this might be pretty rare, I can see how one would want some fields in
large structs to stay close to one another to stay in the same cache lines. If
the compiler moves things at will, you lose the ability to do such things... I
guess you could still use repr(C) and optmize manually, then...

~~~
dbaupp
One thing to consider in this respect is the optimisation won't/can't separate
out the subfields of a field, meaning `struct Foo { ... x: SomeStruct ... }`
will reorder x as a whole, keeping the fields of SomeStruct together. Fields
that are accessed together often could theoretically be pulled out into a
separate type, which, depending on the codebase, may be a nice idea anyway.
(Similarly, one could group the fields in a tuple, `x: T, y: U` => `xy: (T,
U)`, although this is... rather ugly.)

This property also ensures a (hypothetical) type like `CachePadded<T>`, which
keeps a value of type T in its own cache line, works as expected.

------
Animats
That's always been a question in language design - follow the user's layout,
or pack for the machine? For C compatibility, anything passed to C now needs
'repr(C)'. One would think that would be 'repr("C")', since here, "C" is
neither a reserved word nor a variable, but, whatever.

Pascal had "packed" as an option for structures and arrays. Using "packed
array of boolean" got you a bit array. Rust has "repr(packed)", but it was
broken as of Rust 1.0. Something that has no machine address, such as an
individual bit, is apparently troublesome. C supports bit fields in structs,
but that feature is seldom used.

~~~
eddyb
> now needs 'repr(C)'

It always has, we lint for this, and the Rust struct layout has been
officially left unspecified (making assumptions based on it UB) from before
1.0, anyway.

> One would think that would be 'repr("C")', since here, "C" is neither a
> reserved word nor a variable, but, whatever.

Neither is 'repr' \- they're both just identifiers in an attribute.
#[attr("string literal")] is newer and still unstable (as part of the new
macro system).

The bit array packing is something we've wanted to do for a long while, but it
requires an opt-in along the lines of "disallow taking references to any (sub-
byte) fields".

#[repr(packed)] is the same as the C equivalent, be it attribute or pragma, in
that it only removes alignment padding and does nothing to booleans.

------
ryanschneider
What if there was an option to rewrite the source .rs file into the matching
layout? I'd imagine that could be useful in scenarios where one has to
manually do something with a dump or disassembly, etc. Plus then it could be
linted against if a project wanted to ("code does not match in memory layout,
use --rearrange-structs to correct").

~~~
CUViper
In generic code, the generated layouts can't even be written in a universal
way, because they can vary depending on those input types.

------
microcolonel
Somebody should hire this guy to work on the compiler full-time.

~~~
kibwen
Mozilla gives the Rust devs an annual budget for hiring contractors, and
indeed these often end up going to well-known community members who are
tackling major initiatives. I don't know if camlorn is one of these people
(he's well-known, certainly), but quoting aturon from
[https://www.reddit.com/r/rust/comments/64t251/mozilla_awards...](https://www.reddit.com/r/rust/comments/64t251/mozilla_awards_50000_to_the_tokio_asynchronous_io/dg62o0a/)
:

 _" the Mozilla Rust team awards contracts, grants and internships more
directly. We haven't tended to make a lot of fanfare around these in the past,
but the general strategy has been to support people who have been doing
amazing volunteer work so that they can do that work full time for a stretch.
That's covered work from eddyb on the compiler, Integer 32 on crates.io and
Rust training, dtolnay on Serde and other library work, jseyfried on the new
macro system, carllerche on mio and Tokio, and withoutboats on a variety of
topics -- and that's all this year! Beyond that, we have academic grants for
research into Rust for HPC (really large-scale servers/clusters), and
foundational work around the unsafe guidelines/formal semantics for Rust."_

 _" In general we have a preference toward supporting people who are already
doing great work in the ecosystem, rather than continually expanding the
internal Mozilla team. We want Rust to grow and find support far beyond
Mozilla, and this is one way of nudging things in that direction, while
bolstering the ecosystem at the same time."_

 _" While we've awarded almost all of our contracting money for this year,
please feel free to reach out to me at aturon@mozilla.com if you have interest
in small (~3 months) contract work to help push a piece of the Rust ecosystem
(or tooling, or docs, or ...) over the finish line."_

~~~
CAMLORN
Didn't know about this. I'll have to see if I can stay involved with some sort
of project--the problem is that, currently, the one I was doing
is...well...done.

I don't want to leave the Rust ecosystem, and I'd kill to keep working in it.
But I didn't realize that there was any source of general funding from Mozilla
specifically targeted at Rust. This whole thing was done simply because I'm
bored and had a lot of free time and also we thought it would only take a few
weeks. Money was far from the motivation initially, but certainly it occurred
to me later on that it's amazing for the resume.

------
openasocket
anyone know how this plays with generics? With it generate different, optimal
layouts for each instantiation, i.e. will HashMap<u64,64> have a different
layout than Hashmap<u16,u32>? Because that would be a huge win

~~~
CAMLORN
Yes, it will do exactly that.

~~~
frankmcsherry
I think in this case it doesn't, because the `HashMap` implementation just
stores everything behind an untyped pointer.

[https://github.com/rust-
lang/rust/blob/master/src/libstd/col...](https://github.com/rust-
lang/rust/blob/master/src/libstd/collections/hash/table.rs#L115-L123)

It would be cool to see if the "we'll manage our own pointers, thanks"
approach makes less sense with this optimization in place. It would be nice;
the source is pretty inscrutable, mostly in the interest of performance, I
think.

Edit: Or, maybe more importantly: in the untyped memory they store hashes and
(K,V) pairs separately, so I think the only opportunity is to swap (K,V) for
(V,K).

~~~
CAMLORN
I haven't looked at libstd, so I can't comment to that specifically. But we
don't reorder two-field structs.

Rustc special cases them in a lot of places to do things that I will not even
pretend to understand, and not reordering 2-element structs keeps them
working. Since this already took 6 months of my free time (I have a lot of
free time, though not 40 hours a week admittedly), this was for the best.

You can make an argument that we should have, but it doesn't matter from the
user's perspective. (a, b) is the same size as (b, a) no matter what a and b
are.

------
Ono-Sendai
Interesting. Did it result in any speedups?

------
jeffdavis
I wonder whether this would be better as an external tool? It seems almost
inappropriate for a systems language.

It seems like a great way to boost performance, but performance isn't
everything. Compiler simplicity should also be a goal (though perhaps
difficult with rust, I think it's worth some effort). Debuggability should be
a goal.

And I think some kind of ABI guarantees would be nice eventually after the
rest of the language stabilizes.

~~~
CAMLORN
In fairness, the compiler is actually simpler now. Perhaps not as simple as if
fields were always in order, but we only determine what type layouts should be
in one place instead of two now, and 99% of what you have to do in the rest of
it is just foo.memory_index[i] instead of i in all the places you want a field
index.

There's also a helper function that lets you invert it. The part of the
compiler that builds LLVM types just uses this.

The cost here isn't complexity, it's that "Fields are in increasing order by
offset" has been baked into the compiler for years and you have to make sure
to change them all.

------
jitl
Great write-up!

------
mirekrusin
This is impossible. His brain should be connected directly to tensor flow to
continue his work.

