
The Lost Art of Structure Packing (2018) - theastrowolfe
http://www.catb.org/esr/structure-packing/
======
munchbunny
If you do any development that requires touching Win32, structure packing and
memory alignment is still very real.

I was recently doing that in the context of improving a language’s library
support for something that had to go through the OS SDK. I don’t miss the days
when that stuff was the norm.

~~~
BubRoss
I have done some win32 programming but haven't encountered struct packing or
alignment being an issue, where does that pop up?

~~~
gambiting
I work in video games, and very recently we had a sneaky bug in one of our AAA
titles(that was already out!), where(in huge simplification) we had a struct
that looked like:

    
    
      struct Obj
      {
        int foo;
        bool bar;
      }
    

then we were storing those in a custom hashmap using these as keys, where the
hashing function was basically hashing bits of each stored object, without any
awareness of what's in the object.

The bug was found when someone did something like:

    
    
      void DoSomething(Obj obj)
      {
        if(!map.has(obj))
          map.add(obj, new Whatever());
      
        map[obj].blah(); //CRASH null pointer exception
      }
    

I was like.....well, if there is no key "obj" in the map, we insert one....and
yet literally one line after it doesn't have a value for that key??? How can
this be?

Well, it can be because even though the struct looks like it takes 5 bytes, in
reality it's 8 bytes because it's getting padded. So a naive hashing method
that just looks at bits is hashing your 5 bytes of actual data + 3 bytes of
garbage, which means that two "identical" objects are very unlikely to
actually produce the same hash.

C++20 now has a "hashable" concept to help with this, but it still requires
the programmer to be aware of structure packing.

~~~
bregma
If you rely on code containing undefined behaviour you're in for a world of
butthurt sooner rather than later.

There is no way in C++ to get at the padding bytes unless you're using
undefined behaviour. How does the hash function work? Pointer aliasing using
reinterpret_cast? Pointer aliasing using C-style casts? Typing punning through
the old union switcheroo?

~~~
gambiting
I don't have the code in front of me, but something like

    
    
      int hash=0;
      for(int i=0;i<sizeof(obj);i++)
       hash += hashing_method(reintepret_cast<char*>(&obj)+i);
      return hash;
    

Basically hashing each byte of the memory containing the object, regardless of
what the object itself represents.

We can argue whether that's a smart thing to do or not, but I wasn't in charge
of implementing it - it's a relic from a codebase that's more than a decade
old at this point. It's a simple hashing method that works with most types,
but obviously dies horrendously in a case like this.

~~~
nly
This doesn't help you, because the contents of the padding bytes is not
guaranteed to be anything in particular. Two structs containing identical
field values can have different padding bytes. Reading the padding bytes is
UB.

Just pack the struct and be done with it.

------
kazinator
I've documented how GCC does bitfield packing in the TXR reference manual. Or
rather, the abstract algorithm used in the FFI to replicate it.

[https://www.nongnu.org/txr/txr-
manpage.html#N-027D075C](https://www.nongnu.org/txr/txr-
manpage.html#N-027D075C)

This is the result of empirical investigation.

The description also covers allocation of non-bitfields (paragraph 3) and the
padding of the structure (paragraph 9) which require few words.

I felt that the bitfield handling is so obscure that it had to be documented
in detail. If someone is to know exactly what the layout will be, the
documentation can't just be "oh, it will behave like a GCC struct". Well, what
will _that_ do? That is not adequately documented anywhere.

If I have to work with bitfields in just C, I can use that as a reference to
understand what the compiler will do (at least if it's anything compatible
with GCC).

~~~
not2b
You could have saved time and gone to the ABI spec.

[https://itanium-cxx-abi.github.io/cxx-abi/abi.html](https://itanium-cxx-
abi.github.io/cxx-abi/abi.html)

This was originally developed as a joint effort to make compilers ABI-
compatible on Itanium, but it's also used (by GCC, clang, Intel's proprietary
compiler and others) on x86-64.

An old Hacker News comment said that it's from Intel; it's not, it was a joint
effort with lots of work from CodeSourcery and Red Hat folks.

~~~
kazinator
The document you link to is entirely about C++, and makes numerous references
to some "base (C) ABI".

 _" The size and alignment of a type which is a POD for the purpose of layout
is as specified by the base (C) ABI"_

The links in 1.5 Base Documents are old and broken.

------
0xDEEPFAC
Looks like what he really wants is to use Ada which has had much better
support for low-level programming than C.

Example:

Word : constant := 4; -- storage element is byte, 4 bytes per word

type State is (A,M,W,P); type Mode is (Fix, Dec, Exp, Signif);

type Byte_Mask is array (0..7) of Boolean; type State_Mask is array (State) of
Boolean; type Mode_Mask is array (Mode) of Boolean;

type Program_Status_Word is record System_Mask : Byte_Mask;

    
    
           Protection_Key     : Integer range 0 .. 3;
    
           Machine_State      : State_Mask;
    
           Interrupt_Cause    : Interruption_Code;
    
           Ilc                : Integer range 0 .. 3;
    
           Cc                 : Integer range 0 .. 3;
    
           Program_Mask       : Mode_Mask;
    
           Inst_Address       : Address;
    
     end record;
    
     for Program_Status_Word use
    
       record
           System_Mask      at 0*Word range 0  .. 7;
    
           Protection_Key   at 0*Word range 10 .. 11; -- bits 8,9 unused
    
           Machine_State    at 0*Word range 12 .. 15;
    
           Interrupt_Cause  at 0*Word range 16 .. 31;
    
           Ilc              at 1*Word range 0  .. 1;  -- second word
    
           Cc               at 1*Word range 2  .. 3;
    
           Program_Mask     at 1*Word range 4  .. 7;
    
           Inst_Address     at 1*Word range 8  .. 31;
    
       end record;
    
     for Program_Status_Word'Size use 8*System.Storage_Unit;
    
     for Program_Status_Word'Alignment use 8;
    
    

More info:
[https://www.adaic.org/resources/add_content/standards/05aarm...](https://www.adaic.org/resources/add_content/standards/05aarm/html/AA-13-5-1.html)

~~~
samatman
For all its somewhat fussy verbosity, Ada really impresses me every time this
sort of thing comes up.

I keep hoping the Zig developer will do a deep dive on Ada and bring over more
of this kind of precise control. A language where I have this kind of control
over layout, but can still spell `end record;` as `}`, is ideal for some
projects I have in mind.

------
grandinj
I wrote a clang plugin to look for opportunities across a 10M line codebase
and there was surprisingly little to be found. Why? Because on 64-bit Linux,
the current C++ ABI mandates quite large alignment, especially once you are
embedding things inside other things. Packing is still sometimes useful in
speeding things up, but tends to require bitfields and flattening structs
inside structs into a single struct, etc.

------
garjana
When I worked at a prop trading firm on a greenfield market data system this
kind of optimization was very much on our minds. I assume others in this field
also take care to pack structs.

------
adrianN
It would be nice if there was an annotation that just lets the compiler do all
the optimization for me for the cases where I don't care about the memory
layout of the struct. Just like the Rust compiler can do without repr(C)

~~~
elteto
Wait, forgive my ignorance, isn’t this the default? If you don’t care about
the memory layout then won’t the compiler reorg / pad structure members to
fulfill alignment?

~~~
flohofwoe
Nope, C and C++ compilers are not allowed to _reorder_ struct fields (AFAIK at
least, I haven't seen this yet in any real world compiler), but they will add
padding bytes for natural alignment (and that's where the "waste" is coming
from).

IMHO there are just as many arguments for automatic reordering as there are
against it (e.g. creating structs that are layed over memory mapped IO
registers, or just optimizing a struct for certain cache-efficient access
patterns). In my opinion it's sufficient to know about the existance of
alignment-padding, and how to work around it if needed (for instance
reordering the struct members manually, or using #pragma pack)

~~~
TorKlingberg
Yes, the C and C++ standards could have allowed the compiler to reorder
structs, but that would have lead to even more of those undefined behavior
situation that people complain so much about.

People do quite often rely on the first struct element being at the start of
the struct. Memcpy:ing directly between structs and network/disk is also
common, but naughty. Both struct padding and endianness already break that.

------
Natsu
I'm sort of surprised there are no tools for this. I understand why having the
compiler reorder things could be bad, though it seems like there should be
room to tell the compiler it's okay to repack it for minimum space, but I
don't even see any mention of a source-level tool that would just sort the
items in a struct for you.

It seems like something like that could be useful rather than making
programmers try to order their structs by hand.

~~~
SlowRobotAhead
If you just order top down in structures from pointers, 32s, 16s, arrays, 8s,
it’s almost entirely done without thinking.

There is almost never a difference to the user what order things are
structured. Although to be fair this does get tricky with unions of structure
over structure.

~~~
Natsu
True, but that's the kind of drudgery that's best farmed off to computers. I
see that the comment above mentions there is a tool for this that I simply
wasn't aware of.

------
dorianh
I always assumed that in C, a given structure would always have the same
memory layout, no matter what the compiler is (as long as the compilers target
the same architecture of course).

I always assumed that in C++, the memory layout could change a lot between
compilers (the location of the pointer to the vtable for example). Do you know
if it's true, and if the layout do change, could you give an example?

~~~
saagarjha
> I always assumed that in C, a given structure would always have the same
> memory layout, no matter what the compiler is (as long as the compilers
> target the same architecture of course).

Usually, but not always. In most cases there is one efficient way to pack the
structure and still maintain member alignment, but there's not requirement
that the amount of padding looks like this.

> I always assumed that in C++, the memory layout could change a lot between
> compilers (the location of the pointer to the vtable for example). Do you
> know if it's true, and if the layout do change, could you give an example?

Yes, once you have a non-POD type the memory layout can be fairly arbitrary as
you cannot really inspect it and compilers are free to lay it out as they
wish.

------
throwaway_pdp09
It's probably not talked about so much now because of the canard that memory
is cheap, and because HLLs disguise things a bit too much and so mislead
newbies, but to suggest it's lost is plain wrong.

~~~
saagarjha
I think the main reason is that many new languages do this for you, at the
cost of not guaranteeing a particular structure member order.

~~~
WesolyKubeczek
Sheesh, we're trying to be ABI-compatible over here.

~~~
saagarjha
For interoperability concerns within the language itself, this is usually
solved by making the layout algorithm deterministic or passing around the
aggregate around with an invisible pointer. When interacting with other
languages, typically there's an attribute to ensure that layout matches
declaration order.

------
hellofunk
Biggest surprise for me is learning that a pointer is a whopping eight bites!
I have always mistakenly assumed they were small and just four bites.

~~~
saagarjha
It depends on the platform! On ILP64 and LP64 (and presumably P64, though I've
never heard of this actually being used anywhere) they'll be 8 bytes, but not
on most 32-bit architectures.

~~~
flyingfences
Isn't that the _definition_ of a 32-bit (or 64-bit or 8-bit or however-many-
bit) - that 32 bits is the length of a pointer?

~~~
saagarjha
Certain strange platforms (arm64_32 for Apple Watch) run ILP32 on AArch64.

------
joosters
TLDR: When defining a struct, put the biggest items first. This works well in
almost all cases.

If you need to squeeze out all padding, use compiler directives like '#pragma
pack', but be aware of the performance implications.

