
The Lost Art of C Structure Packing (2014) - bramv
http://www.catb.org/esr/structure-packing/
======
kazinator
TL;DR: to get smaller structures, don't do stuff like this:

    
    
       struct foo {
         char a;
         int b;
         char c;
         int d;
       };
    

but this:

    
    
       struct foo {
         int b;  // or int b, c;
         int c;
         char a;
         char c;
       };
    

Basically if you sort the types by size in reverse descending order, you get
optimal packing without messing with compiler-specific packing extensions that
skew alignment and possibly bloat code.

The worst that you will get is padding at the end of the structure so that if
two or more of them are arrayed, the first member is correctly aligned at all
the array indices.

~~~
dysoco
Alright this might be a stupid question, but if packing is about the order in
which the fields are arranged... shouldn't the compiler optimize that? I mean,
it does much more complex optimizations already doesn't it?

~~~
gaius
What if your struct is actually a memory-mapped I/O device? Far from
optimizing you need to give a hard guarantee that it will NEVER be optimized
by the compiler.

~~~
kazinator
Suppose C didn't require order of struct members. How could we support I/O
devices? Simply by wrapping them behind functions or macros which take a base
address and access a specific register relative to that address using pointer
arithmetic and dereferencing. You see this being done anyway!

    
    
        if ((SERIAL_STATUS_PORT(port_addr) & SERIAL_DTR) != 0) {
          /* DTR line is asserted */
        }

~~~
gaius
More likely you'd have to drop down into ASM for those bits...

You see in a lot of high performance networking code too, you just get a block
of bytes off the wire, cast it to a struct and use it straight away. High
risk, high reward :-)

------
mschwaig
See also previous discussions of this article from 2014 and 2015:

[https://news.ycombinator.com/item?id=6995568](https://news.ycombinator.com/item?id=6995568)

[https://news.ycombinator.com/item?id=9069031](https://news.ycombinator.com/item?id=9069031)

------
minipci1321
Sigh... apparently this is now enough of rocket science to be considered for
posting on HN...

People using packing pragma of GCC should also beware -- an access to a field
of a packed structure will be done bytewise, whether a variable happens to be
actually aligned or not (I guess the compiler simplified its life by assuming
no variable is ever aligned in packet structs), so memory size would go down
but CPU use might grow.

At least this used to be true a few years ago, haven't reverified recently.

~~~
jcranmer
On x86, most memory can be accessed unaligned (and, unless you cross a cache
line boundary, there's effectively no penalty. If you're on Haswell or newer,
even if you cross a cache line boundary, there's no latency hit for an L1
cache hit).

The code for a struct { char a; double b; } load reference on x86-64 after -O3
is:

    
    
        movsd 1(%rdi), %xmm0
    

So it looks like gcc is able to condense the load on platforms with unaligned
accesses. On ARM, it does look like the double is loaded byte-by-byte.

~~~
minipci1321
> So it looks like gcc is able to condense the load on platforms with
> unaligned accesses.

It seems there are still some leftovers -- try: struct { int i:31; }; with and
without __attribute__((__packed__)).

------
Uptrenda
Very nice guide. I remember seeing extremely similar code when I was working
with raw TCP packets and I always wondered what all the esoteric rules were
for working with raw struct memory. There was no where that seemed to explain
things happening on this level so it kind of just never made any sense to me.
Guess now I can finally figure out what all that code meant, awesome.

~~~
khedoros
A few years ago, I ran into some situations where packed structs would be
useful. I ended up writing a lot of little test programs to verify struct
sizes based on different element orders, packing pragmas, etc. It was
certainly illuminating, but it would've been helpful having a write-up like
this to work from.

It got even more complicated when I threw unions into the mix. Knowing how the
data is stored at the bit level becomes important in some of those cases.

------
ThatGeoGuy
Although I somewhat tire of constant comparisons between Rust and C, in this
case it's interesting so I'll ask:

Does Rust do struct packing any differently than C and what kinds of tradeoffs
are associated with that? Most people don't even think about struct packing in
the context of C++ (which is in many ways closer to Rust) because of vtables /
inheritance / etc, but in this case I'd like to know if Rust does anything
differently or requires something new to think about in this respect.

~~~
Crespyl
IIRC, Rust struct layout is undefined/unstable, _unless_ you explicitly use
the #repr(C) tag on the struct definition, in which case it behaves pretty
much exactly like C. You can also specify that the struct be packed, with no
padding between items if necessary.

I'm not sure if unmarked Rust structs actually get layout optimized or not
right now, but they've worked to keep that option available.

~~~
Gankro
Unless someone's recenrly bothered to implement it, I believe Rust doesn't
currently bother to optimize your struct layout. It _can_ and it surely _will_
but not yet (this is a dangerous state of affairs since people may start to
rely on it).

------
jgrahamc
Ah, yes. BUS ERROR: [http://blog.jgc.org/2007/04/debugging-solaris-bus-error-
caus...](http://blog.jgc.org/2007/04/debugging-solaris-bus-error-caused-
by.html)

~~~
Roboprog
Using a byte array and manually marshalling stuff in and out is tedious, but
it does avoid alignment issues.

There's a certain amount of irony that dynamic languages (e.g. Perl) make it
easier to implement something like "pack" and "unpack" to move values in and
out of such a byte array, due to the variable type arguments/results of the
"unpacked" values.

~~~
Roboprog
Otherwise, I assume that avoiding this problem is a matter of putting the
larger elements at the start of the struct, and the smaller elements at the
end (with the exception that for a nested array, you consider the element
size, not the total array)

8 byte entries, then 4 byte entries, 2 byte entries, chars/bytes, bits.

------
rwmj
Strange that he doesn't mention dwarves
([https://github.com/acmel/dwarves/](https://github.com/acmel/dwarves/)
[http://www.ukuug.org/events/linux2007/2007/papers/Melo.pdf](http://www.ukuug.org/events/linux2007/2007/papers/Melo.pdf)).

Edit: He does mention it in passing but says he hasn't used it. Suggest that
he _does_ use it because it's a really useful tool and lots of the manual
stuff he's doing is better done automatically.

------
Roboprog
You can sometimes save some memory in these sort of "flyweight" records by
moving all the strings in a group into a string table. Make one byte-array
buffer for the text from multiple records, then use a ("small") offset integer
in each record to refer to the starting position of strings within the string
table.

Rather than using a 64 bit pointer, or an inline array of max-size, for each
string, just use perhaps a 16 bit offset into the table for each string. (or
32 bit if you expect enough data).

Of course, this also requires wrapper setter/getter type code, but it can be a
good trade to save space.

------
blaisio
Is this really a lost art? I feel like this is one of the main reasons people
who use C still use C: you have a lot more control over how memory is handled.
Honestly, if someone doesn't know about this, then they really just don't know
C, because it's a pretty fundamental piece of information.

------
Animats
The bit-field feature of C structs is underutilized. People are still writing
hex constants and using AND and OR to clear and set bits. Let the compiler do
that; it's more readable. As of C99, there's named structure initialization,
which makes bitfield constants more readable.

~~~
stephencanon
C11 §6.7.2.1: "The order of allocation of bit-fields within a unit (high-order
to low-order or low-order to high-order) is implementation-defined. The
alignment of the addressable storage unit is unspecified."

This makes bitfields quite difficult to use correctly when portability is of
interest. It's often easier to write a couple inline get/set functions and use
bit indices. If portability isn't a concern, then sure, have at it.

~~~
Animats
That's an escape clause for endian issues. If you write hex constants, you
still have endian issues.

------
tboneatx
Knowing about struct memory alignment can come in in very handy when reverse-
engineering 3rd party binary protocols. Often you come across many "unknown"
bytes which turned out they were just padding because of 4 byte struct
alignment and could be discarded. Saved a bunch of time not having to try to
figure out what they were.

------
_RPM
I compile with -Wpacked when I'm building something to serialize structs to
disk. It helps. I know sizeof will return the value with the padding, but I'd
rather do it myself.

------
jevinskie
I wish pahole got more attention since I had some great experiences with it
years ago. I tried it recently and it didn't like the DWARF info that my
compiler was spitting out. Perhaps I need to tweak the DWARF versioning or,
_gulp_ , try and add the missing DWARF support to pahole.

EDIT: I'm starting to think my "recent" testing hasn't been too recent. It
looks like development picked up again mid-2015. I'll have to give it another
run!

[http://git.kernel.org/cgit/devel/pahole/pahole.git/log/](http://git.kernel.org/cgit/devel/pahole/pahole.git/log/)

------
jstelly
This can also be important for routines that treat structures as strings of
bytes. e.g. void MD5( unsigned char *pStructure, size_t size );

struct foo { char a; int x; };

if I MD5( &foo, sizeof(foo) ); on two foo with the same a & x, they may not
produce the same MD5 because the pad bytes might contain noise from the stack
or heap.

The solution is to either zero memory on anything you will MD5 or explicitly
declare all of the padding (easy if you have internalized the rules) so your
code can handle it.

------
qwertyuiop924
ESR is a good programmer, and this is a handy guide for those who don't know
how to pack structures. He's also insane, so don't trust anything he says.

Oh, you don't think he's insane? He thinks there's a conspiracy amongst women
in open source to discredit Linus Torvalds. No, I'm not joking. I wish I was.

~~~
andrepd
Offtopic. His opinions on open source politics are little matter to his
technical insights. I can appreciate Hitler's paintings, etc etc.

~~~
qwertyuiop924
...Which is why despite that, I still think this article is good, and said so.

------
Annatar
esr, brilliant as always.

However:

 _since shipping the first version of this guide I have been asked why, if
reordering for minimal slop is so simple, C compilers don’t do it
automatically. The answer: C is a language originally designed for writing
operating systems and other code close to the hardware. Automatic reordering
would interfere with a systems programmer’s ability to lay out structures that
exactly match the byte and bit-level layout of memory-mapped device control
blocks._

If the programmer wants or needs absolute control, just do not provide the
reordering option to the optimizing compiler. So _why_ wouldn't compilers
provide a command line _option_ to do automatic reordering?

~~~
qwertyuiop924
Brilliant, but crazy.

------
dspeyer
Last time I manually packed structures I was doing GPU programming. I forget
the details, but the CPU and GPU had different alignment requirements so
anything other than a manually packed structure broke in weird ways.

------
sfrailsdev
My understanding (and i I'm wrong, tell me) is that if you try to serialize or
save to disk something with a bitfield directly, you can have big problems.

~~~
bluGill
Bitfields orders are "implementation defined". So you need to ask the vendor
of every compiler you support what the compiler does. Once you know what the
compilers you support do you can write code specific for them and everything
works great. It is only when someone tried a different compiler (including an
upgrade of the existing one) that you can run into problems.

------
samfisher83
You can kind of do this is in C# with StructLayoutAttribute. One of the few
high level languages you can do it with.

~~~
pjmlp
Ada, Modula-2, Modula-3, Eiffel, D, Rust, Turbo Pascal, Free Pascal, Delphi,
...

There are more than just a few.

------
Too
In chapter 4, padding outside structure. Isn't the compiler free to do
whatever it wants there? There might not even be a need to reserve memory for
some variables if the optimizer concludes they are irrelevant for the program?

------
gambiting
It's not lost at all - I work as a games dev, and I can assure you that a
major AAA title released this year on all platforms cares a lot about C struct
packing(it's actually part of our code review process!).

------
edem
I've heard about this at a talk 4 years ago and I'm not even a C programmer so
there is hope!

------
zeusk
An article on C structure packing without any mention of
__attribute__((__packed__))?? meh.

~~~
khedoros
> GCC and clang have an attributepacked you can attach to individual structure
> declarations; GCC has an -fpack-struct option for entire compilations.

He mentioned it in passing (and without proper syntax) in section 11.

------
Artlav
Huh? "Lost art"?

That's, like, programming 101 grade material, not some "lost art".

Then again, i worked in telecom and HPC, and play with uCs, so perhaps i'm
wearing the wrong googles...

~~~
qwertyuiop924
you are. It's not exactly a lost art, but it's not known except to those
inside of the secret, uninviting inner circles of C programming, which you can
only enter if you memorize all the UB.

------
thinkMOAR
great bit! still, but how often is this going to be reposted?

[https://hn.algolia.com/?query=The%20Lost%20Art%20of%20C%20St...](https://hn.algolia.com/?query=The%20Lost%20Art%20of%20C%20Structure%20Packing&sort=byPopularity&prefix&page=0&dateRange=all&type=story)

~~~
jessaustin
Every time we'd like another round of "Lost art? Why, I packed me some C
structs this morning over breakfast! You young whippersnappers!" comments from
crusty old C dudes.

------
cheez
In this thread: people who do this for a living making people who don't feel
dumb.

Guys, it's called "The Lost Art of C Structure Packing" for a reason. Python
and Ruby guys have no idea.

Even most C or C++ devs wouldn't know.

Take a chill pill.

~~~
tedunangst
That doesn't make it a lost art. I've no clue how to make wine (something to
do with grapes?) but I don't describe it as a lost art.

~~~
cheez
Most programmers would have known this a few years ago. Programmers today
don't. So, it is a lost art.

~~~
dmit
There are more people alive who know this in the year 2016 than at any prior
point in history.

~~~
cheez
I guess

