The Lost Art of Structure Packing (2018)

munchbunny · on April 27, 2020

If you do any development that requires touching Win32, structure packing and memory alignment is still very real.

I was recently doing that in the context of improving a language’s library support for something that had to go through the OS SDK. I don’t miss the days when that stuff was the norm.

BubRoss · on April 27, 2020

I have done some win32 programming but haven't encountered struct packing or alignment being an issue, where does that pop up?

gambiting · on April 27, 2020

I work in video games, and very recently we had a sneaky bug in one of our AAA titles(that was already out!), where(in huge simplification) we had a struct that looked like:

  struct Obj
  {
    int foo;
    bool bar;
  }

then we were storing those in a custom hashmap using these as keys, where the hashing function was basically hashing bits of each stored object, without any awareness of what's in the object.

The bug was found when someone did something like:

  void DoSomething(Obj obj)
  {
    if(!map.has(obj))
      map.add(obj, new Whatever());
  
    map[obj].blah(); //CRASH null pointer exception
  }

I was like.....well, if there is no key "obj" in the map, we insert one....and yet literally one line after it doesn't have a value for that key??? How can this be?

Well, it can be because even though the struct looks like it takes 5 bytes, in reality it's 8 bytes because it's getting padded. So a naive hashing method that just looks at bits is hashing your 5 bytes of actual data + 3 bytes of garbage, which means that two "identical" objects are very unlikely to actually produce the same hash.

C++20 now has a "hashable" concept to help with this, but it still requires the programmer to be aware of structure packing.

saalweachter · on April 27, 2020

In my first job, there was an interesting bug introduced by an un-terminated pragma-pack in a header file.

Because not every structure was packed, depending on your include order, some structures would be packed differently in different compilation units.

Except, for the only structures this happened for, the packed packing was coincidentally the same as the default packing ... when the program was compiled 32-bits. Attempting to switch from a 32-bit executable to a 64-bit executable resulted in mysterious segfaults as different compilation units disagreed on where different fields were.

Fun times!

gambiting · on April 27, 2020

>> some structures would be packed differently in different compilation units.

karlding · on April 27, 2020

This is why I always use static_assert [0] to assert on the struct size in projects using C11+, or use a macro to create my own (using the negative array size trick) to cause a compile error if I can't use C11 as a sanity check for these cases. This saves a lot of potential headaches.

[0] https://en.cppreference.com/w/c/language/_Static_assert

flohofwoe · on April 27, 2020

It's not just the different "under the hood size" that's a problem in this situation, but alignment-padding bytes added by the compiler inbetween struct members will have random junk data in them, the compiler will not zero-initialize padding bytes, unless you explicitely memset() the struct (and even then I wouldn't count on that the padding bytes aren't "tainted" later).

E.g. if you initialize a C struct or C++ object the "usual way":

Obj obj = { };

There will most definitely be junk in the padding bytes.

steerablesafe · on April 27, 2020

Zeroing padding bytes is an interesting scenario. The C++ standard does mandate zeroing padding bytes in certain cases[1], but some compilers do not conform. It's especially a nasty issue when partially overlapping subobjects and RVO are added to the mix. There is a possible defect in the C++ standard here.

[1] https://eel.is/c++draft/dcl.init#6.2

bregma · on April 27, 2020

If you rely on code containing undefined behaviour you're in for a world of butthurt sooner rather than later.

There is no way in C++ to get at the padding bytes unless you're using undefined behaviour. How does the hash function work? Pointer aliasing using reinterpret_cast? Pointer aliasing using C-style casts? Typing punning through the old union switcheroo?

gambiting · on April 27, 2020

I don't have the code in front of me, but something like

  int hash=0;
  for(int i=0;i<sizeof(obj);i++)
   hash += hashing_method(reintepret_cast<char*>(&obj)+i);
  return hash;

Basically hashing each byte of the memory containing the object, regardless of what the object itself represents.

We can argue whether that's a smart thing to do or not, but I wasn't in charge of implementing it - it's a relic from a codebase that's more than a decade old at this point. It's a simple hashing method that works with most types, but obviously dies horrendously in a case like this.

nly · on April 27, 2020

This doesn't help you, because the contents of the padding bytes is not guaranteed to be anything in particular. Two structs containing identical field values can have different padding bytes. Reading the padding bytes is UB.

Just pack the struct and be done with it.

saagarjha · on April 27, 2020

Yeah, this is technically undefined.

Asooka · on April 27, 2020

You can trivially get at the padding bytes, just cast to char pointer. Their contents are undefined and there is absolutely no guarantee that they will remain constant, but they're there.

Generally you have several possibilities for how to escape this problem, but the simplest is to just add the padding and a static assert that the sizeof the struct is what you expect.

saagarjha · on April 28, 2020

> Their contents are undefined

No, their contents has an unspecified value.

RealityVoid · on April 27, 2020

Ok, this will sound stupid then, but is pointer aliasing using casts undefined?

saagarjha · on April 28, 2020

Not stupid–pointer aliasing using a cast is undefined if the types do not match, unless the type you are converting to is a void * , char * , or unsigned char *.

kazinator · on April 27, 2020

> C++20 now has a "hashable" concept to help with this

I'm sorry, you can easily solve this problem in K&R C from 1979.

shawxe · on April 27, 2020

Yeah, it's called memset. Not sure if this bug is supposed to be subtle or something, but if you require the padding bytes to be consistent then you need to consistently initialize your structs. Ignoring UB often leads to these sorts of bugs.

EDIT: Based on some of the comments I'm getting here, it seems like some of you have never implemented a hashmap/generic interface in vanilla C and it shows. If you want a hashmap that is generic and you don't want to write/specify a hashing function for keys on map initialization, then what you're likely to end up doing is ensuring that keys are initialized consistently, providing key size on map initialization, and simply performing a hash on the key as though it were a buffer of bytes.

Suggesting that this is somehow any more fragile than anything else in C or that templates should be used instead, which is an absurd comment since templates do not exist in C, is ridiculous. This comment is replying to a comment about how solutions to this problem have existed since K&R C--and they have. You don't need templates to not ignore UB, although if you are using C++ you can certainly use templates (and also take advantage of the stronger type system) to work around issues like this.

The point is that avoiding undefined behavior in C/C++ hashmap implementations is not something that has only recently become possible. C solutions may be more fragile, but that doesn't stop them from being "correct" in that will yield correct behavior unless an error is made elsewhere. Code that makes assumptions about the values of padding without explicitly setting those values is NOT correct and for anyone who works in a language like C/C++ regularly, that should be obvious.

saagarjha · on April 27, 2020

I don’t think memset is required to zero out padding bytes. The correct way to do this is to only access non-padding bytes.

shawxe · on April 27, 2020

sizeof(struct foo) is required to return the actual size, including padding, so

struct foo foo_inst; memset(&foo_inst, 0, sizeof(foo_inst));

will certainly result in all padding bytes being set to zero.

adrianN · on April 28, 2020

Reading padding bytes is UB, so the compiler could choose to optimize that memset call so that the padding bytes aren't touched.

saagarjha · on April 27, 2020

sizeof will include padding, but there is no need for memset to actually perform a write that cannot be consistently observed.

icedchai · on April 27, 2020

Are you saying memset does not have to write the entire struct? I understand it can be optimized out, but thought it if it was actually executed it had to set all of the specified bytes.

saagarjha · on April 28, 2020

memset must write to the bytes that underly the members of the struct. What it does to padding bytes is not of consequence to you, as you cannot observe it as far as I understand. (That is, if you tried to read it out it would not necessarily be the value you had memset, or even be consistent across reads.)

icedchai · on April 28, 2020

memset definitely writes to padding bytes, as noted by several other folks here. you observe these padding bytes during serialization and deserialization (disk, network, memory mapped buffer, etc.)

Note this is important enough that compiler folks consider it a bug if it doesn't work correctly. see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92486 for a recent example involving memset and memcpy.

saagarjha · on April 28, 2020

As mentioned in the bug report, the value of padding bytes is unspecified by the C standard. Actually, according to the linked Defect Report mentioned in the bug, there is a bit of discussion on whether the functions you mentioned should be entirely undefined to call or if some exceptions should be carved out. If the functions are legal to call, it is very likely that the results will be some arbitrary reification of the unspecified value.

icedchai · on April 28, 2020

There is some discussion of that, but it is wrong. There are many comments there, so you should read the whole bug report. The value of the padding bytes must be what is specified in memset.

kazinator · on April 27, 2020

memset is absolutely required to zero out padding bytes. Padding is part of the structure size, and memset hast to zero out exactly as many bytes as it is told.

memseting a structure to zero is commonly done in programs that send structure outside of the process (like passing it to communication or storage-related system calls) because the padding can leak sensitive information. That better work!

If the size of a structure didn't include the padding, then pointer arithmetic on structures wouldn't work correctly, and arrays of them would be broken/impossible. Arrays are the reason for the padding; given a struct foo * p, we need p + 1 to be properly aligned (for the sake of accessing all the members of * (p + 1). So struct foo cannot have a size like 5, if it contains a member of type int or anything else with alignment requirements.

saagarjha · on April 27, 2020

memset is not required to do anything if the side effects are not observable; this is the entire reason why memset_s needed to be added (and also why __attribute__((packed)) is recommended for anything that is being sent directly over the wire, if this kind of construction cannot be avoided). Compilers often inline and unroll small, constant-size memsets anyways, and since it is not possible to observe a consistent padding in C in a standards-compliant way to my knowledge, an optimizing compiler should be able to legally leave those bits alone.

kazinator · on April 28, 2020

I've been duped into participating in misleading thread without brushing up on this. The problem is that memset can be entirely optimized away when it's a dead store, which is reasonable:

  {
    struct foo x;
    // sensitive calculation with x
    memset(&x, 0, sizeof x);
  }

Basic liveness analysis (compiler technique from the 1970's if not older) tells us that the object has no "next use" at the point where it is being written by memcpy. That's a dead store that can be eliminated. The object is about to become toast. This is a problem for sensitive code (e.g. crypto).

I've been discussing only this case:

  {
    struct foo x;
    memset(&x, 0, sizeof x);
    // init x
    syscall(&x, sizeof x);
  }

Here, the memset cannot be optimized away. So we can only have some academic discussion about how part of the memset could be optimized away: that part which flosses the structure padding between the members and at the end.

That's a stupid and dangerous optimization that threatens a whole lot of code in the wild.

A good defense against this sort of time-wasting nonsense is "I'm not fixing anything without a repro test case; have a nice day".

saagarjha · on April 29, 2020

As I was discussing with 'gpderetta, it's complicated: https://news.ycombinator.com/item?id=23002423. Depending on how the followup to the DR I mentioned goes, your call is somewhere between "illegal as this invokes undefined behavior" and "sending structs across the wire better not be an infoleak". We'll see where the standards people take this. (Personally, I'm of the opinion that reading padding should always give an unspecified value–that is, a read can give an arbitrary, possibly inconsistent but valid value for that byte, but copies of it would have a fixed value. Code in the wild that has historically not cared for standards compliance anyways, e.g. Linux, should get a GCC flag for "make my padding bits what I want them to be".)

gpderetta · on April 27, 2020

Well, the contents of the padding bits is legally observable (with a memcpy for example) so the compiler can't assume they are unovservable.

saagarjha · on April 27, 2020

As far as I understand, padding bits are indeterminate until you observe them via a memcpy into a buffer, at which point they will collapse (only in the buffer) to some arbitrary but now-constant value. Is this correct?

gpderetta · on April 28, 2020

I think you are right in the general case. Definitely padding bits are not preserved during copies, but I think that at least in some cases they might be preserved, at least I think that if you memset+memcpy, the memcpy is guaranteed to see the zeros (what if you want to end the lifetime of the object and simply reuse the storage as an array of chars?). But there is divergence between C and C++ and also an area quite in flux (the notion of lifetimes of objects and memory locations gets tweaked every standard in C++, not sure where we are at now).

edit: interestingly, Biriba is yet another game in the canasta family, my theory is that different groups were originally playing these differents variants, but when they gained knowledge of the more popular variant (burraco), they started playing it but kept calling it with the original name. I guess that up until the internet era, these games were mostly passed via oral knowledge in casual groups.

edit2: macchiavelli [1] is another very fun game of the same family, but a lot more puzzle solving oriented.

[1] https://en.wikipedia.org/wiki/Machiavelli_(Italian_card_game...

saagarjha · on April 28, 2020

I looked it up for C, and as far as I can tell this has not standardized. (The Defect Report I read, which seems to have the latest opinion, literally calls such things "wobbly values". I'm not joking.) The TL;DR of Defect Report #451 seems to be that this is needs some work from the standard. In the case of an automatic struct or whatever (so you can't pull the bytes out from under it and reuse it) I think the consensus seems to be tending towards Heisenburg-ish values that have values that can change between reads, and certain library functions will be legal to call on them so that they "collapse" the state in the view that you've just created. For your case, I would have the question of whether two objects (e.g. a char array and a struct carved from that array) can share the same memory; in this case I would intuitively tend to agree with you that the padding bits are visible from the longer-lived object, kind of like an ad-hoc union. But I am not a licensed language lawyer, so I am not quite sure on that point.

I appreciate your edits, by the way ;)

gpderetta · on April 28, 2020

Thanks for hunting down the details.

And it seem that I should double check which comment I'm editing before submitting :)

kazinator · on April 27, 2020

memset_s is a pointless misinvention. I can't think of any situation in which I would use memset, and not want it to be what memset_s is claimed to be.

Which is to say that memset_s should just be called memset; there is no need for memset to be doing stupid things so that people must use memset_s.

I have no plans to use memset_s (ever), or to upstream any fix that involves using it unless the author provides a repro test case, and a proof that the problem can't be fixed with compiler options that make the problem go away with memset.

In the worst imaginable scenario, I will #define memset memset_s everywhere (after the inclusion of <string.h> of course, not before, and an #undef memset).

(The #undef memset may be enough, in fact, if the only problem is that memset is #define'd to some compiler built-in that doesn't properly implement classic C90 memset in all cases.)

rurban · on April 27, 2020

Agree. memset should be secure by default. That means it should not be optimized away by the compiler. And second the secure variant should flush the content, so that cache attacks can not read secrets which were memset'ed, but still in the cache on broken CPU's (Intel).

saagarjha · on April 28, 2020

sloooooooooooooooow. No, really, do you want a full flush every time you do a memset? That would be horrendous for performance.

kazinator · on April 28, 2020

"And second the secure variant should flush ..."

Probably means the extra-secure variant, more secure than the regular non-optimized-away memset that really clears the memory, but without the flush.

saagarjha · on April 29, 2020

Yeah, that's a more reasonable interpretation.

saagarjha · on April 27, 2020

Most of the Annex K * _s functions are pretty stupid, to be honest, but this is one that is actually useful. It's not that memset is doing stupid things; rather it's being smart and skipping work that it's not required to do.

kazinator · on April 27, 2020

memset has always been required to set N bytes starting at a given address to zero. That's the requirement. Been that way since it appeared in AT&T Unix and beat BSD's bzero to the ANSI C punch.

There is no need for a broken memset that fails to set some of the bytes, so that a fixed one under a different name has to be used in its place.

If you're using memset such that it's okay for memset not to set some of the bytes, and you'd like them not to be set if that makes things faster, then you shouldn't be using memset. You're using a hammer to drive a screw: wrong tool.

C has perfectly good initialization and assignment for structures.

End of story; I'm going to walk away pretend I never read this subthread, re-joining the hordes of C programmers using memset in the normal way, adding to the countless lines of code that do it that way and are never going to be changed.

This bullshit will be backpedaled out of the standard eventually, you just wait.

saagarjha · on April 28, 2020

…and assigning to a variable is supposed to put some bytes at a certain address, yes. Except compilers will skip doing so if you never read from the variable again, and the same thing happens here: the compiler can "know" that it has no need to actually do the assignment, as writing to padding bytes is not required to actually do anything and trying to observe the value is currently the subject of debate but somewhere between "undefined" and "whatever you get out of it has nothing to do with what you think went in".

kazinator · on April 27, 2020

One solution consists of using memset to initialize the structure to zero, and using memcpy to copy it instead of structure assignment. (Problem: pass-by-value in function calls won't use memcpy; abstractly, it uses member-for-member assignment which is not required to copy padding. A function that wants to calculate the correct has has to prepare a blank object with memset, then individually assign the fields into it from the incoming object.)

Another solution, more along the lines of what I was thinking, is simply to associate the hash table with a hashing function which processes the type as a structure, hashing the members individually rather than as a pad of memory.

C++ templates refine this by adding the ability to deduce the hashing function statically, and possibly inline it, which we could do with some preprocessing in C, along the lines of how those TAILQ macros from BSD work for linked lists.

shawxe · on April 27, 2020

> Another solution, more along the lines of what I was thinking, is simply to associate the hash table with a hashing function which processes the type as a structure, hashing the members individually rather than as a pad of memory.

This solution is probably what I would go for in most cases as well. The most compelling reasons I can think of for going the other way would be if there were a desire to use a specific hashing function/algorithm on all keys regardless of type or if there were a desire to have keys of different types in the same map.

banachtarski · on April 27, 2020

Wrong answer. This is brittle and relies on the memory always being initialized correctly and will be prone to all sorts of issues in the wild. Better is just either templatize on the key type or store the size on the map.

shawxe · on April 27, 2020

To be clear, there are obviously different/better ways to handle this in C++. We're talking about K&R C here.

banachtarski · on April 27, 2020

Right I mentioned "store size of the key on the map" as the second option for K&R/Ansi C.

saagarjha · on April 27, 2020

The answer is wrong more because it relies on padding bytes having consistent values than a lack of templating.

scatters · on April 27, 2020

memset, sure, then you copy the struct (return it or pass it by value) and the compiler won't bother copying the padding bytes. Or worse, an optimizing compiler will see that you're writing to padding bytes and helpfully no-op it.

shawxe · on April 27, 2020

First of all, there is no way the compiler is going to optimize a call to memset(&foo, 0, sizeof(foo)) when &foo is being interpreted as a void pointer. That doesn't even make sense.

Second of all, in a generic C interface keys are likely to be treated as void pointer and almost certainly are going to be moved around with memcpy etc. rather than returned/passed by value since doing so would make the interface non-generic.

jcelerier · on April 27, 2020

> First of all, there is no way the compiler is going to optimize a call to memset(&foo, 0, sizeof(foo)) when &foo is being interpreted as a void pointer. That doesn't even make sense.

that is really dangerous to assume. memset is a compiler built-in in every relevant C++ compiler and the compiler definitely knows the type of the object that is behind your void* and knows if you're being nasty.

e.g. look at this code : https://gcc.godbolt.org/z/xh9BXs

it's UB, and the compiler knows it and inserts an "invalid opcode" instruction even if you try to hide a memset behind a void*-taking function

saagarjha · on April 27, 2020

-Ofast makes non-compliant programs anyways, so it's not the best example.

jcelerier · on April 27, 2020

I had -Ofast in there but you can check that it's the same with -O2

scatters · on April 27, 2020

The compiler knows what memset does. One of the earliest steps of optimization is replacing calls to well-known functions with intrinsics. Try it! https://godbolt.org/z/QUREQi

Yes, it's true that generic C code will type-erase the key type. However it just takes a little refactoring in specific code to move the struct initialization across a call boundary from where it is passed to the generic code.

kazinator · on April 28, 2020

The compiler knows that memset clobbers an object, and can classify that as a dead store.

I'm skeptical about compilers optimizing memset not to cover padding between structure members.

Firstly, that would introduce security holes into a heck of a lot more existing code compared to code that uses a dead-store memset to wipe sensitive crypto.

Secondly, it wouldn't run any faster. Gaps in a structure and at the end exist in order to eliminate misalignment. Before most padding, there is a member that ends on a misaligned address. It's slower to update just that member, and leave the padding alone, than to clobber the padding.

For instance if we have a { char a; int b; char c; } structure, we gain nothing by zeroing just one byte of a, b and c.

In some compiler for an 8 bit system, this reasoning is likely false; I will worry about it when porting to that. Very little existing code will fit; you're coding from scratch for such things.

dependenttypes · on April 27, 2020

The HN markdown ate your stars.

shawxe · on April 27, 2020

Thanks for letting me know, should be better now! :-)

kazinator · on April 27, 2020

Basically, put a space after the star that you want to render verbatim, and don't try to do that in a region of text that is already italicized with stars.

E.g. "... being interpreted as a void * type."

kazinator · on April 27, 2020

A compiler that optimizes away memset is not one that could be used for targeting anything that sits on a network.

saagarjha · on April 27, 2020

That immediately rules out GCC and Clang…

kazinator · on April 27, 2020

I don't think so, unless we're mistaking memset for __builtin_memset.

scatters · on April 27, 2020

GCC treats the two identically. See https://github.com/gcc-mirror/gcc/blob/229c0ef777161ec5adfb7...

kazinator · on April 28, 2020

I see; because the sixth argument (BOTH_P) in that macro call is true.

munchbunny · on April 27, 2020

That's true but missing the point.

The point is that C++20 adds a way to catch yourself before you make the exact type of mistake the grandparent comment talks about.

https://en.cppreference.com/w/cpp/language/constraints

munchbunny · on April 27, 2020

It's abstracted behind the API's data types as long as you're using C++, but in order to work with the data types you often have to maneuver around the specific packing/alignment. The Windows ETW tracing API has several examples of specific "structure packing" phenomena:

1. Arranging fields to pack things that are shorter than 4 bytes along 4 byte boundaries.

2. Manually arranging data buffers in very specific layouts (this is less about the original post, more about a different interpretation of "packing").

Aside from what I mentioned above, more commonly you will run into this in interop scenarios, such as calling C/C++ API's from C#. Thankfully those scenarios are few and far between.

kazinator · on April 27, 2020

I've documented how GCC does bitfield packing in the TXR reference manual. Or rather, the abstract algorithm used in the FFI to replicate it.

https://www.nongnu.org/txr/txr-manpage.html#N-027D075C

This is the result of empirical investigation.

The description also covers allocation of non-bitfields (paragraph 3) and the padding of the structure (paragraph 9) which require few words.

I felt that the bitfield handling is so obscure that it had to be documented in detail. If someone is to know exactly what the layout will be, the documentation can't just be "oh, it will behave like a GCC struct". Well, what will that do? That is not adequately documented anywhere.

If I have to work with bitfields in just C, I can use that as a reference to understand what the compiler will do (at least if it's anything compatible with GCC).

not2b · on April 27, 2020

You could have saved time and gone to the ABI spec.

https://itanium-cxx-abi.github.io/cxx-abi/abi.html

This was originally developed as a joint effort to make compilers ABI-compatible on Itanium, but it's also used (by GCC, clang, Intel's proprietary compiler and others) on x86-64.

An old Hacker News comment said that it's from Intel; it's not, it was a joint effort with lots of work from CodeSourcery and Red Hat folks.

kazinator · on April 29, 2020

The document you link to is entirely about C++, and makes numerous references to some "base (C) ABI".

"The size and alignment of a type which is a POD for the purpose of layout is as specified by the base (C) ABI"

The links in 1.5 Base Documents are old and broken.

carapace · on April 27, 2020

That's awesome! It's the kind of detail that, when you need it you really need it, but it's often so hard to find, or locked in some proprietary deal. I can barely imagine the amount of work it must have taken to nail all that down. Congratulations, and thank you!

kazinator · on April 27, 2020

I tried numerous cases, and looked at the memory, and also read and wrote the structures with FFI to make sure they match what the C compiler is putting out, and fixed bugs along the way. For big endian investiations, I borrowed the big endian PPC machine courtesy of the GCC Compile Farm project.

The details are not obvious; like the fact that a zero width bitfield like "int : 0" that appears etween two members that are not bitfields actually does something. E.g. this has size 5:

   struct {
     char c1;
     int : 0; // zero-width bit-field must be unnamed
     char c2;
   };

This is basically because c1 is de-facto considered to be 8 allocated bits out of an int-wide cell, leaving 24 bits in that cell. The int : 0 sees that a field has been partially filled and so increments to the next int-wide field (according to my documented hypothesis).

ISO C says (or did say in 1999) only this: "A bit-field declaration with no declarator, but only a colon and a width, indicates an unnamed bit-field.105) As a special case, a bit-field structure member with a width of 0 indicates that no further bit-field is to be packed into the unit in which the previous bit-field, if any, was placed."

No "further bit-field" is to be packed, but in this example there is neither a previous nor next bit field. So you might expect that there is no effect. In the GCC model of "all allocated so far are just bits", it has an effect.

Footnote 105 says just that "An unnamed bit-field structure member is useful for padding to conform to externally imposed layouts" which is more or less self-evident.

Oh wow; I just realized that the empty bit-field has an effect if it is the last member also:

   struct {  // now size 8!
     char c1;
     int : 0;
     char c2;
     int : 0;
   };

This is predicted by my documentation, but it should be spelled out in an explicit remark.

It's a useful feature of GCC bit-fields because you can conform to certain external layouts without having to use bit-fields at all, other than the zero-width ones.

carapace · on April 27, 2020

Holy wow.

People like you who are willing and able to do this are pillars for the whole of our field. You're a hero. :)

0xDEEPFAC · on April 27, 2020

Looks like what he really wants is to use Ada which has had much better support for low-level programming than C.

Example:

Word : constant := 4; -- storage element is byte, 4 bytes per word

type State is (A,M,W,P); type Mode is (Fix, Dec, Exp, Signif);

type Byte_Mask is array (0..7) of Boolean; type State_Mask is array (State) of Boolean; type Mode_Mask is array (Mode) of Boolean;

type Program_Status_Word is record System_Mask : Byte_Mask;

       Protection_Key     : Integer range 0 .. 3;

       Machine_State      : State_Mask;

       Interrupt_Cause    : Interruption_Code;

       Ilc                : Integer range 0 .. 3;

       Cc                 : Integer range 0 .. 3;

       Program_Mask       : Mode_Mask;

       Inst_Address       : Address;

 end record;

 for Program_Status_Word use

   record
       System_Mask      at 0*Word range 0  .. 7;

       Protection_Key   at 0*Word range 10 .. 11; -- bits 8,9 unused

       Machine_State    at 0*Word range 12 .. 15;

       Interrupt_Cause  at 0*Word range 16 .. 31;

       Ilc              at 1*Word range 0  .. 1;  -- second word

       Cc               at 1*Word range 2  .. 3;

       Program_Mask     at 1*Word range 4  .. 7;

       Inst_Address     at 1*Word range 8  .. 31;

   end record;

 for Program_Status_Word'Size use 8*System.Storage_Unit;

 for Program_Status_Word'Alignment use 8;

More info: https://www.adaic.org/resources/add_content/standards/05aarm...

samatman · on April 27, 2020

For all its somewhat fussy verbosity, Ada really impresses me every time this sort of thing comes up.

I keep hoping the Zig developer will do a deep dive on Ada and bring over more of this kind of precise control. A language where I have this kind of control over layout, but can still spell `end record;` as `}`, is ideal for some projects I have in mind.

SlowRobotAhead · on April 27, 2020

I use C for embedded. I do the “lost art” of structure packing all the time. I don’t want ada.

I’m not sure why “use a different language” is such a common reply to any language specific discussion.

grandinj · on April 27, 2020

I wrote a clang plugin to look for opportunities across a 10M line codebase and there was surprisingly little to be found. Why? Because on 64-bit Linux, the current C++ ABI mandates quite large alignment, especially once you are embedding things inside other things. Packing is still sometimes useful in speeding things up, but tends to require bitfields and flattening structs inside structs into a single struct, etc.

garjana · on April 27, 2020

When I worked at a prop trading firm on a greenfield market data system this kind of optimization was very much on our minds. I assume others in this field also take care to pack structs.

adrianN · on April 27, 2020

It would be nice if there was an annotation that just lets the compiler do all the optimization for me for the cases where I don't care about the memory layout of the struct. Just like the Rust compiler can do without repr(C)

dan-robertson · on April 27, 2020

This is hard. If your struct definition is in a header file then the compiler needs to generate correct field offsets for anything that includes that header file, so it would need to always optimally pack structure the same way based only on the definition and not on the usage.

People will also do silly things like casting between types in ways that rely on similarly written structures having similar memory layouts. So C is probably a bad language to turn this on by default in

htfy96 · on April 27, 2020

Automatic reordering of fields is great, but sometimes people know more about how they will be used (like frequently accessed groups of fields, or leave some fields in the first 256 bytes so a u8 relative index could be used to save space), so manual reordering still exists for a reason beyond packing.

ChrisSD · on April 27, 2020

I believe adrianN was suggesting that it would be useful if you could opt-in to automatic reordering on a per struct basis. Currently C doesn't give a choice other than to do it manually.

elteto · on April 27, 2020

Wait, forgive my ignorance, isn’t this the default? If you don’t care about the memory layout then won’t the compiler reorg / pad structure members to fulfill alignment?

flohofwoe · on April 27, 2020

Nope, C and C++ compilers are not allowed to reorder struct fields (AFAIK at least, I haven't seen this yet in any real world compiler), but they will add padding bytes for natural alignment (and that's where the "waste" is coming from).

IMHO there are just as many arguments for automatic reordering as there are against it (e.g. creating structs that are layed over memory mapped IO registers, or just optimizing a struct for certain cache-efficient access patterns). In my opinion it's sufficient to know about the existance of alignment-padding, and how to work around it if needed (for instance reordering the struct members manually, or using #pragma pack)

TorKlingberg · on April 27, 2020

Yes, the C and C++ standards could have allowed the compiler to reorder structs, but that would have lead to even more of those undefined behavior situation that people complain so much about.

People do quite often rely on the first struct element being at the start of the struct. Memcpy:ing directly between structs and network/disk is also common, but naughty. Both struct padding and endianness already break that.

Chabsff · on April 27, 2020

The rule needs to be deterministic accross compilers, since a library can be compiled using a compiler and linked in an executable using the same header.

SlowRobotAhead · on April 27, 2020

A practical reason you can’t just allow the compiler to do it is if you are doing goofy things like overlaying two structures, void pointer manipulation or casting one object to the format of another because you know the first n number of elements line up anyhow, or if you’re changing specific bytes as memory manipulation and not by the object itself.

Typically if I define a 8bit as the first element, I need to be certain those 8bits are first even if that wastes three more bytes to align on the next variable.

steerablesafe · on April 27, 2020

Unfortunately pointer comparison between addresses of subobjects has a defined order, so reorg is not possible.

virgilp · on April 27, 2020

Pad - yes, typically. Reorg - seldom.

klodolph · on April 27, 2020

Reorg is prohibited by C standard.

gpderetta · on April 27, 2020

It is complicated. Because of the as-if rule, a compiler can do anything it wants as long as a conforming program can't tell the difference. So for example if a compiler can prove that a program doesn't compare fiel addresses, doesn't cast pointer to a struct to its first member, etc it can reorder as it pleases. Turns out it is very hard to prove, often requires whole program optimization and the gains are questionable, so it is seldom done. Both clang and gcc had such an optimization pass in the past but it got dropped.

virgilp · on April 27, 2020

I know, it wasn't obvious to me that elteto talked exclusively about C/C++. AFAIK few -if any- compilers/VMs take it upon themselves to reorder struct members (even outside C/C++ world).

saagarjha · on April 28, 2020

Both Swift and Rust have an unspecified structure ordering by default, which almost always boils down to "the compiler will lay them out in the order specified, except if you leave enough padding to fit a member, in which case it'll reorder the elements for you".

leni536 · on April 27, 2020

Even though struct member reordering is prohibited, the order of std::tuple elements are not fixed by the standard. There could be an stdlib implementation that used this fact to reorder tuple members optimally.

AFAIK there are PoC 3rd party implementations for such tuples.

elteto · on April 27, 2020

I see, thanks all.

nly · on April 27, 2020

Great idea, but you have to define these optimizations and do them deterministically, even for debug builds, all the time - otherwise you'll never have a stable ABI.

Natsu · on April 27, 2020

I'm sort of surprised there are no tools for this. I understand why having the compiler reorder things could be bad, though it seems like there should be room to tell the compiler it's okay to repack it for minimum space, but I don't even see any mention of a source-level tool that would just sort the items in a struct for you.

It seems like something like that could be useful rather than making programmers try to order their structs by hand.

smcameron · on April 27, 2020

There's pahole https://lwn.net/Articles/335942/

SlowRobotAhead · on April 27, 2020

If you just order top down in structures from pointers, 32s, 16s, arrays, 8s, it’s almost entirely done without thinking.

There is almost never a difference to the user what order things are structured. Although to be fair this does get tricky with unions of structure over structure.

Natsu · on April 28, 2020

True, but that's the kind of drudgery that's best farmed off to computers. I see that the comment above mentions there is a tool for this that I simply wasn't aware of.

dorianh · on April 27, 2020

I always assumed that in C, a given structure would always have the same memory layout, no matter what the compiler is (as long as the compilers target the same architecture of course).

I always assumed that in C++, the memory layout could change a lot between compilers (the location of the pointer to the vtable for example). Do you know if it's true, and if the layout do change, could you give an example?

saagarjha · on April 27, 2020

> I always assumed that in C, a given structure would always have the same memory layout, no matter what the compiler is (as long as the compilers target the same architecture of course).

Usually, but not always. In most cases there is one efficient way to pack the structure and still maintain member alignment, but there's not requirement that the amount of padding looks like this.

> I always assumed that in C++, the memory layout could change a lot between compilers (the location of the pointer to the vtable for example). Do you know if it's true, and if the layout do change, could you give an example?

Yes, once you have a non-POD type the memory layout can be fairly arbitrary as you cannot really inspect it and compilers are free to lay it out as they wish.

ChrisSD · on April 27, 2020

Most platforms have have system level C APIs that are exposed to userspace. When using these APIs it's necessary to layout structures as they expect. Therefore all compilers on the same platform (OS+arch) would usually produce the same layout for structs so that they are compatible.

However this isn't universally true. Some platforms might not have a well defined C ABI.

throwaway_pdp09 · on April 27, 2020

It's probably not talked about so much now because of the canard that memory is cheap, and because HLLs disguise things a bit too much and so mislead newbies, but to suggest it's lost is plain wrong.

saagarjha · on April 27, 2020

I think the main reason is that many new languages do this for you, at the cost of not guaranteeing a particular structure member order.

WesolyKubeczek · on April 27, 2020

Sheesh, we're trying to be ABI-compatible over here.

saagarjha · on April 27, 2020

For interoperability concerns within the language itself, this is usually solved by making the layout algorithm deterministic or passing around the aggregate around with an invisible pointer. When interacting with other languages, typically there's an attribute to ensure that layout matches declaration order.

jb1991 · on April 27, 2020

Biggest surprise for me is learning that a pointer is a whopping eight bites! I have always mistakenly assumed they were small and just four bites.

saagarjha · on April 27, 2020

It depends on the platform! On ILP64 and LP64 (and presumably P64, though I've never heard of this actually being used anywhere) they'll be 8 bytes, but not on most 32-bit architectures.

flyingfences · on April 28, 2020

Isn't that the _definition_ of a 32-bit (or 64-bit or 8-bit or however-many-bit) - that 32 bits is the length of a pointer?

saagarjha · on April 29, 2020

Certain strange platforms (arm64_32 for Apple Watch) run ILP32 on AArch64.

joosters · on April 27, 2020

TLDR: When defining a struct, put the biggest items first. This works well in almost all cases.

If you need to squeeze out all padding, use compiler directives like '#pragma pack', but be aware of the performance implications.