
The Lost Art of C Structure Packing - Tsiolkovsky
http://www.catb.org/esr/structure-packing/
======
jandrewrogers
I do optimal structure (and bit) packing without much thought because it is an
old habit. As the article states, I have noticed that the only other people
that do habitual careful structure optimization these days have been doing
low-level and high-performance code as long as I have. Most programmers are
oblivious to it.

The reasons you would do it today are different than a decade ago and the
rules have changed because the processors have changed. To add two clarifying
points to the original article:

\- The main reason to do optimal structure packing today is to reduce cache
line misses. Because cache line misses are so expensive it is a big net
performance gain in many cases to have the code do a little more work if it
reduces cache line fills; optimal structure packing is basically a "free" way
of minimizing cache misses.

\- On modern Intel microarchitectures, alignment matters much less for
performance than it used to. One of the big changes starting with the i7 is
that unaligned memory accesses have approximately the same cost as aligned
memory accesses. This is a pretty radical change to the optimization
assumptions for structure layout. Consequently, it is possible to do very
tight memory packing without the severe performance penalty traditionally
implied.

What constitutes "optimal" structure packing is architecture dependent. The
original C structure rules were designed in part to allow the structures to be
portable above all else. If you design highly optimized structures for a
Haswell processor, code may run much more slowly or create a CPU exception and
crash on other architectures, so keep these tradeoffs in mind. The article is
discussing basic structure packing which typically has easily predictable
behavior almost anywhere C compiles.

~~~
3pt14159
Why doesn't the compiler handle this for the programmer? Seems "free" in the
no-tradeoffs sense.

~~~
colin_mccabe
_Why doesn 't the compiler handle this for the programmer? Seems "free" in the
no-tradeoffs sense._

Traditionally, C and C++ programs have been compiled by running the compiler
on each .c file, and then linking together the results. If each invocation of
the compiler could put the fields of a structure in arbitrary order, the
different object files would not work together correctly. C structures do not
include runtime type information and all offsets are calculated during the
compile phase. The same issues come up with libraries. We need a stable
application binary interface, or ABI.

Even if the ABI problem was solved, there are advantages to letting the
programmer determine the order of fields in a structure. If you know that
different threads are going to be using different parts of a structure, you
will want to arrange things (or insert padding) so that the different threads
are using different cache lines. This avoids so-called "false-sharing."

~~~
pedrocr
>If each invocation of the compiler could put the fields of a structure in
arbitrary order, the different object files would not work together correctly.

That just requires that your packing algorithm be deterministic and there's no
reason for it not to be.

I suppose it would fail in cases where someone defines a 2 element struct in
one file that's a subset of a 3 element struct in another file, and then casts
the 3 element into the 2 element.

~~~
com2kid
Which fails if I am using two different compilers are sending a structure over
a network connection!

That is why I am familiar with struct packing actually, for creating protocols
it is essential each end has the structure defined and laid out in the exact
same way.

~~~
midas007
Erlang's type system with binary data-type is pretty awesome.

Example: [https://groups.google.com/d/msg/erlang-
programming/s39IQlk-d...](https://groups.google.com/d/msg/erlang-
programming/s39IQlk-do0/za4eyVWkPwEJ)

I wished more programming languages had baked in support for bitfields, bit
arrays and endian conversion, because building interoperable, efficient binary
clients without icky code generation is currently a PITA.

------
anatoly
One trick that's not mentioned is unbundling the struct. Suppose you have a
struct with a pointer and a character in it, and a huge array of those
structs. If you resent the padding tax, refactor your code to use and pass
around two arrays instead, one of pointers and the other of chars.

~~~
danieldk
In fact, in Java this is the only way to simulate structs as POD rather than
paying for the overhead of an array of objects.

~~~
wging
Sorry, POD?

~~~
brendano
plain old data

~~~
wging
Ah, thanks.

------
alextingle
Is this a "lost art"? I always consider the layout when I'm writing a C
struct. It's the principal concern that governs the correct ordering of the
members.

~~~
gaius
Agreed. This is just how normal people, err, construct structs.

 _You are not an advanced C programmer until you have grasped it. You are not
a master of C until you could have written this document yourself and can
criticize it intelligently_

In other words, a "hacker" is someone just like ESR. More blowhardery. I for
one don't claim to be a master...

~~~
hobs
To be fair, he didn't claim that by writing this document he was a master of
C, or that a master would only know this document, just that it was a piece of
knowledge a master would should definitely have. Phrasing I guess.

~~~
hdevalence
"A master of C must have the same views of what's important and the same
writing style that I do"

i.e., 'a hacker is someone who looks like me'.

~~~
philh
That is not a charitable interpretation of what he wrote. Try, "A master of C
must already know about structure packing deeply enough to teach it".

~~~
gaius
Try reading some more ESR, that is a common theme in his writing,
unfortunately.

~~~
philh
I agree that "people being uncharitable about ESR" is a common theme.

~~~
mwfunk
No, seriously. This is a common theme in just about everything ESR writes. For
example, for some inexplicable reason, the Jargon File used to have an entry
stating that hackers tend to be libertarian. This, of course, was written when
ESR self-identified as a libertarian. Later on (post-9/11), he decided that he
was a neocon instead. Strangely, the entry in the Jargon File was then updated
to state that hackers tend to be neocons.

That's just one particularly egregious example. There are a gazillion others.
Pretty much everything the guy has ever written has similar implications,
where he defines a hacker as being whatever he sees himself as at the time. He
really really wants to believe that there is a very specific hacker subculture
(in his words, "our tribe"), and that they have a shared culture, heritage,
and beliefs about things, and that they have some sort of meritocracy where
certain members of that subculture are universally agreed upon as being wise
and correct about everything (in his words, "the elders of our tribe"), and of
course, that he is one of those people at the top of the imaginary meritocracy
in this imaginary subculture.

Also, for bonus douchebaggery points, as if he needed them:
[http://esr.ibiblio.org/?p=208](http://esr.ibiblio.org/?p=208)

~~~
philh
> For example, for some inexplicable reason, the Jargon File used to have an
> entry stating that hackers tend to be libertarian. This, of course, was
> written when ESR self-identified as a libertarian. Later on (post-9/11), he
> decided that he was a neocon instead. Strangely, the entry in the Jargon
> File was then updated to state that hackers tend to be neocons.

The current entry:

> Formerly vaguely liberal-moderate, more recently moderate-to-neoconservative
> (hackers too were affected by the collapse of socialism). There is a strong
> libertarian contingent which rejects conventional left-right politics
> entirely. The only safe generalization is that hackers tend to be rather
> anti-authoritarian; thus, both paleoconservatism and ‘hard’ leftism are
> rare. Hackers are far more likely than most non-hackers to either (a) be
> aggressively apolitical or (b) entertain peculiar or idiosyncratic political
> ideas and actually try to live by them day-to-day.

archive.org says this was the same in 2003. (Possibly it changed and changed
back, but I assume not.)

From march 2000 (v4.2.2 at [http://jargon-file.org/archive/](http://jargon-
file.org/archive/) which I selected somewhat at random, I didn't try to find
the earliest "politics" entry):

> Vaguely liberal-moderate, except for the strong libertarian contingent which
> rejects conventional left-right politics entirely. The only safe
> generalization is that hackers tend to be rather anti-authoritarian; thus,
> both conventional conservatism and `hard' leftism are rare. Hackers are far
> more likely than most non-hackers to either (a) be aggressively apolitical
> or (b) entertain peculiar or idiosyncratic political ideas and actually try
> to live by them day-to-day.

ESR in 2008 ( [http://esr.ibiblio.org/?p=301](http://esr.ibiblio.org/?p=301)
):

> I am not and have never been a conservative. Much less a “neocon”, whatever
> that means.

"Hackers tend to be libertarian" and "hackers tend to be neocons" are not
sentiments expressed by either variant. "He decided that he was a neocon" just
seems to be plain false. Whatever the merits of the changes he made, I claim
that you are being uncharitable towards him.

I'm not necessarily defending ESR himself, I just think that you're attacking
someone who isn't ESR and calling them ESR. And I think this is a common theme
when people talk about ESR.

~~~
djur
Before 2001 it was "vaguely liberal-moderate... strong libertarian
contingent... anti-authoritarian", when ESR identified as a libertarian.

After 2001 it was "formerly liberal-moderate... moderate-to-neoconservative",
when ESR had gone full-on warblogger. Around the same time he also added
heavily politically biased definitions for terms like "fisking" and
"idiotarian" to the Jargon File.

He claimed not to be a neocon in 2008 when "neocon" had become a dirty word. I
don't think that changes the fact that he was a fervent supporter of the "War
on Terrorism", a neocon project.

It sounds to me like you're agreeing with the parent -- he changed the
description of a "hacker's" politics to fit whatever his particular political
stance was at the time.

~~~
prodigal_erik
[http://en.wikipedia.org/wiki/Fisking](http://en.wikipedia.org/wiki/Fisking)
points out that the style of rebuttal had been common on Usenet for years,
there just wasn't a word for it until 2001. I don't think most people are
familiar enough with Fisk to give it any political connotations.

I agree he should stop trying to make "idiotarian" happen and it's completely
out of place in the File.

~~~
djur
From the ESR Jargon File: "Named after Robert Fisk, a British journalist who
was a frequent (and deserving) early target of such treatment." He was
"deserving" because warbloggers like ESR hated him.

------
drdaeman
This could be partially automated with `__attribute__((__packed__))` and a bit
of -fipa-struct-reorg for better cache performance. Sadly, there's no any kind
of `__reorder_yes_i_know_and_i_want_to_violate_c_standard__` attribute. But I
really believe managing and optimizing memory layout (unless explicitly
necessary, like when declaring serialization formats) should be compiler's
job, not human's.

~~~
JoachimSchipper
I'm not sure letting the compiler go wild would be such a great idea: one of
the strengths of C is predictable performance, which would be hard to obtain
if the compiler is allowed to e.g. move data across cache lines.

~~~
_delirium
If you're running on one microarchitecture, I agree. But if you're running on
more than one, manual structure packing may actually give more unpredictable
performance than letting GCC handle it. At least GCC will make an effort to
optimize for each one and hopefully avoid things that are absolutely terrible
to do on that arch (orders-of-magnitude performance loss type stuff), while a
manually chosen packing optimized for one microarch can be hugely pessimal on
another one.

That's a guess though, no numbers. :)

------
rwmj
He should mention this tool:

[http://linux.die.net/man/1/pahole](http://linux.die.net/man/1/pahole)

~~~
JoachimSchipper
gcc -Wpadded is very useful, too (its chief downside is that you don't usually
want to combine it with -Werror - padding is not always bad.)

~~~
agwa
You can always add -Wno-error=padded to disable -Werror just for -Wpadded

~~~
JoachimSchipper
I seem to have missed that trick. Thanks!

------
solarexplorer
If your struct covers more than one cache line, you may want to think about
which members are accessed together and put those in the same cache line. E.g.
if you manage to fit all frequently accessed members in the first cache line,
you will bring likely useful data into the cache when you access any of them.
At the same time you avoid cache pollution with not so useful data from other
cache lines.

~~~
jkrems
For people like myself who need a short reminder what a "cache line" is:

* [http://en.wikipedia.org/wiki/CPU_cache#Cache_entries](http://en.wikipedia.org/wiki/CPU_cache#Cache_entries)

* [http://stackoverflow.com/questions/14707803/line-size-of-l1-...](http://stackoverflow.com/questions/14707803/line-size-of-l1-and-l2-caches)

------
waynecochran
Classic use case: creating a packed structures to directly read TCP packets.
There is still the "convert from big to little endian" problem, but I think
you can use GCC's method for individually packing a struct:

    
    
      struct TCP_Packet {
        uint16_t source_port;
        uint16_t dest_port;
        uint32_t seq_no;
        uint16_t flags;
        uint16_t window_size;
        uint16_t checksum;
        unit16_t urgent;
        ...
      }  __attribute__ ((packed));

~~~
throwaway2048
many packet/wire formats are conceived as structs when they are designed

------
JoachimSchipper
Note that the key takeaway is just to order struct members by size, largest
first; this isn't always sufficient, but it's a good habit wherever
performance could matter.

~~~
nnethercote
Yeah. It's an incredibly verbose document, with that key insight buried in a
mere two sentences near the end:

> The simplest way to eliminate slop is to reorder the structure members by
> decreasing alignment. That is: make all the pointer-aligned subfields come
> first, because on a 64-bit machine they will be 8 bytes. Then the 4-byte
> ints; then the 2-byte shorts; then the character fields.

------
overgard
Just a feature idea: I'd love an IDE plugin (for something like visual studio
or eclipse) that would show this sort of information. Sort of like a
disassembly, but for structs instead. I can imagine lots of industries where
they'd pay for that sort of thing (IE, game programmers or OS developers,
etc.)

~~~
aidenn0
Embedded IDEs already do this (showing the offsets of elements in a structure
in a column when you are viwing structs).

~~~
sitkack
Do you have some screenshot examples? I like the idea of this feature.

------
danellis
__attribute__((packed)) in GCC is probably worth mentioning if you only want
to apply it to selected structs.

~~~
YZF
#pragma pack(push)

#pragma pack(1)

MSVC

~~~
pjscott
GCC also supports that pragma, for compatibility with MSVC:

[http://gcc.gnu.org/onlinedocs/gcc/Structure-Packing-
Pragmas....](http://gcc.gnu.org/onlinedocs/gcc/Structure-Packing-Pragmas.html)

Clang has this, too.

~~~
userbinator
And MSVC doesn't support GCC's pragma, so for optimal portability #pragma
pack(1) is my preferred way of doing it.

------
stinos
small typo, under 4 it says _a 54-bit machine._

small remark: _this will cover dates to 2050_

that's a bit of a gamble, not sure I would have done that myself. It's
probably ok but you never know in 40 years CVS and your code is still in use
somewhere and you'll cause some people headaches :]

~~~
Dylan16807
It's especially weird as an optimization when an unsigned int would last until
2118. (Or 2106 if the weird base-changing was removed)

~~~
stinos
excellent point.

------
ChuckMcM
Interesting timing, I've been playing around with Cortex M chips and a _lot_
of the demo code has structures that are almost pathologically mis-aligned
things like

    
    
       struct SomeCamelCaseNonsenseStructure {
           uint8_t flag;
           uint32_t pointer;
           uint8_t another_flag;
           uint32_t another_pointer;
           ...
       }
    

I feel like flinching every time I read it.

------
mcguire
For those of you playing along at home, Rust currently manually interoperates
with C structures. This[1] is one of the struct stat structures:

    
    
                    pub struct stat {
                        st_dev: dev_t,
                        __pad1: c_short,
                        st_ino: ino_t,
                        st_mode: mode_t,
                        st_nlink: nlink_t,
                        st_uid: uid_t,
                        st_gid: gid_t,
                        st_rdev: dev_t,
                        __pad2: c_short,
                        st_size: off_t,
    

Note the __pad elements.

[1]
[https://github.com/mozilla/rust/blob/master/src/libstd/libc....](https://github.com/mozilla/rust/blob/master/src/libstd/libc.rs#L369-L390)

------
bjornsing
> _If the compiler happened to map [the first member of a struct] c to the
> last byte of a machine word, the next byte (the first of p) would be the
> first byte of the next one and properly pointer-aligned._

No, a compiler can't do this. You have to assume that the first member of a
struct will be machine word aligned. I'm sure there are many reasons, but the
one I can think of now is that structs can be dynamically allocated. That
means it has to be possible to take the return value of malloc() and assign it
to a pointer, and there is no way to get malloc() to return a memory block
with that wired alignment (starting on the last byte of a machine word).

------
userbinator
Coming from an 8-bit assembly background, I've always found this alignment
size/speed tradeoff to be a little vexing (either it's slower, or we're
wasting memory), and thought "couldn't the hardware be better?" so finding out
that x86 has evolved to the point where unaligned accesses basically have no
penalty was really pleasing. No more bytes wasted, so we can have small _and_
fast. (How do they do this? By making the memory bus a lot wider than a word,
among other things.) IMHO ARM is just starting to see the value of this and
catching up, if only the other architectures would do the same...

------
soup10
Is there an easy way to get the compiler to show what padding it's using for a
structure? (other than filling the structure with marker values then dumping
the data and picking through it with a hex editor)

~~~
jgale
The pahole(1) utility, mentioned above.

------
Dylan16807
>It’s still 24 bytes because c cannot back into the inner struct’s trailing
padding. To collect that gain you would need to redesign your date structures.

Well that's disappointing. Is there any [nonstandard] way to get a compiler to
take advantage of that space?

------
jeremiep
I tend to _only_ use packed structures when I need to match a binary format or
know I'll create huge arrays of structures.

For the rest I blindly trust the compiler to choose optimal boundaries and
alignments.

~~~
dfox
Problem with that approach is that compiler cannot do any optimalizations of
struct field layout as it's completely specified by platform ABI and order in
which are fields declared.

------
jeffdavis
This is still very practical when considering disk layouts of the structures.
You want something compact enough on disk, but when it comes into memory, you
still benefit from aligned access.

------
xarien
There are still statically typed languages used today like ada. although even
within Ada, you'll often see programmers package spare buffers into structures
to increase flexibility.

------
bobowzki
This is very useful when programming microcontrollers. Especially if you
create many structs in an array. Many parallel state machines or something
like that.

------
jheriko
this is indeed somewhat lost. i learned it from coding standards at one place
a long time ago... but since then I've had to explain it x number of times to
juniors to the point where I wrote an article to point them at instead:

[http://jheriko-rtw.blogspot.co.uk/2011/02/know-your-cc-
struc...](http://jheriko-rtw.blogspot.co.uk/2011/02/know-your-cc-struct-
layout-rules.html)

------
DigitalJack
I recently had to do this (padding) as I was using memcpy to load a struct
from reading a binary file for a personal project.

------
xkarga00
Why does "sizeof(struct foo5))" in packtest.c returns 8 instead of 6?

------
jjacobson
This is the kind of stuff they teach you in college with a CS degree.

~~~
Moto7451
Not universally. None of my university's Java based curriculum or my brother's
university's Python based curriculum mentioned this. My community college's
C++ based curriculum mentioned it in passing.

I seem to remember it being mentioned in Bruce Eckel's excellent "Thinking in
C++".

That said, I'm kinda curious how much this affects Objective C objects.

~~~
krapp
Never mentioned once in my curriculum at all...

