Hacker News new | past | comments | ask | show | jobs | submit login
The Lost Art of C Structure Packing (catb.org)
335 points by Tsiolkovsky on Jan 1, 2014 | hide | past | web | favorite | 143 comments



I do optimal structure (and bit) packing without much thought because it is an old habit. As the article states, I have noticed that the only other people that do habitual careful structure optimization these days have been doing low-level and high-performance code as long as I have. Most programmers are oblivious to it.

The reasons you would do it today are different than a decade ago and the rules have changed because the processors have changed. To add two clarifying points to the original article:

- The main reason to do optimal structure packing today is to reduce cache line misses. Because cache line misses are so expensive it is a big net performance gain in many cases to have the code do a little more work if it reduces cache line fills; optimal structure packing is basically a "free" way of minimizing cache misses.

- On modern Intel microarchitectures, alignment matters much less for performance than it used to. One of the big changes starting with the i7 is that unaligned memory accesses have approximately the same cost as aligned memory accesses. This is a pretty radical change to the optimization assumptions for structure layout. Consequently, it is possible to do very tight memory packing without the severe performance penalty traditionally implied.

What constitutes "optimal" structure packing is architecture dependent. The original C structure rules were designed in part to allow the structures to be portable above all else. If you design highly optimized structures for a Haswell processor, code may run much more slowly or create a CPU exception and crash on other architectures, so keep these tradeoffs in mind. The article is discussing basic structure packing which typically has easily predictable behavior almost anywhere C compiles.


Why doesn't the compiler handle this for the programmer? Seems "free" in the no-tradeoffs sense.


Why doesn't the compiler handle this for the programmer? Seems "free" in the no-tradeoffs sense.

Traditionally, C and C++ programs have been compiled by running the compiler on each .c file, and then linking together the results. If each invocation of the compiler could put the fields of a structure in arbitrary order, the different object files would not work together correctly. C structures do not include runtime type information and all offsets are calculated during the compile phase. The same issues come up with libraries. We need a stable application binary interface, or ABI.

Even if the ABI problem was solved, there are advantages to letting the programmer determine the order of fields in a structure. If you know that different threads are going to be using different parts of a structure, you will want to arrange things (or insert padding) so that the different threads are using different cache lines. This avoids so-called "false-sharing."


>If each invocation of the compiler could put the fields of a structure in arbitrary order, the different object files would not work together correctly.

That just requires that your packing algorithm be deterministic and there's no reason for it not to be.

I suppose it would fail in cases where someone defines a 2 element struct in one file that's a subset of a 3 element struct in another file, and then casts the 3 element into the 2 element.


Which fails if I am using two different compilers are sending a structure over a network connection!

That is why I am familiar with struct packing actually, for creating protocols it is essential each end has the structure defined and laid out in the exact same way.


Erlang's type system with binary data-type is pretty awesome.

Example: https://groups.google.com/d/msg/erlang-programming/s39IQlk-d...

I wished more programming languages had baked in support for bitfields, bit arrays and endian conversion, because building interoperable, efficient binary clients without icky code generation is currently a PITA.


> That just requires that your packing algorithm be deterministic and there's no reason for it not to be.

The packing algorithm would also need to be static, since object code produced by different compilers, or by different compiler settings (eg. -O2 or -O3) is still expected to link together correctly without error.

Some kind of attribute declaration in the struct definition itself could manage this, assuming that the algorithm it specifies is deterministic. But it can't be completely automatic, since that would differ from the current (deterministic) algorithm.


The offsets of code and data blocks are already undetermined before linking. Even if the algorithm is non-deterministic, what's stopping the optimisation from not running until link time?


sizeof (foo) would not be known until link time.

static_assert(sizeof(foo)==64, "you didn't align foo correctly") would not be possible.


Sometimes you need to access a structure from assembler code, so the structure layout has to be manually set up.


But why not let the programmer opt in to manually ordering the field, and let the compiler do it the rest of the time? Similar to how the inline specifier works for functions, the compiler can inline things that it thinks will be faster if inlined, but if the programmer knows that can't be done, or must be done, they can say so.


The elements of a C struct must be sequentially ordered in memory as specified in the code, that's defined in the standard.


In turn, various C code is then written depending on this expectation, meaning a non-compliant compiler would break a significant number of codebases.


On the other hand, I for one would happily throw such brittle, unportable code under the bus in the name of automatically reduce cache line missing.

About the only safe usage of structure memory layout that I can think of is offsetof, and even then, I think you could make offsetof's behaviour definable in the ABI to make it deterministic.


That unportable code includes CPython, Sun JVM, essentially anything GUI related in Windows and anything that uses BSD sockets.


> On the other hand, I for one would happily throw such brittle, unportable code under the bus in the name of automatically reduce cache line missing.

Spoken like someone who's never been forced to fix such code left behind by previous coders.

> About the only safe usage of structure memory layout that I can think of is offsetof, and even then, I think you could make offsetof's behaviour definable in the ABI to make it deterministic.

Sure, especially considering offsetof is standard defined and allowed to be compiler implemented AFAIK. That said, you bring up a great point: You're talking about breaking decades old ABIs if you were to enable such memory layout reordering by default. Good luck tracking down updated .libs or original source to build .libs from for every single one of your dependencies!


I'm not sure you know what "unportable" means. Every program compiled by a compliant compiler will work. "unportable" doesn't apply to compilers which do the wrong thing, it applies to programs which do not assume anything outside what is in the specification.


Actually, C standard does not directly specify that. Only fact about structure layout that is explicitly required by standard is that address of first field is same as address of the structure itself, other constraints on structure layout can be deduced from other requirements scattered through the standard, but nothing in standard requires in memory field order to be same as declaration order.


"A structure type describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array), each of which has an optionally specified name and possibly distinct type"

I think sequentially allocated states that fact. From what information do you mean it can be deduced?


It seems that my memory does not serve me well :)

ISO C actually explicitly specifies that the order is same in §6.7.2.1.13.

The "deduced" remark was essentially intended to cover constraints that come from 6.2.6.1.2 ("...objects are composed of contiguous sequences of one or more bytes..."), ie. fields cannot be (conceptually) outside &foo and ((char*)&foo)+sizeof(foo), which can be hard to implement on sufficiently weird architectures.


It would break things. If you have a structure that is part of a network transmission or file format, reordering the fields would read the wrong data. On top of that C programmers are used to doing memory hacking type things like taking the address of fields and in some case may need to rely on the certain memory layout of a struct being what they declared it as.


You can't use a struct in a portable way for sending directly on the network or writing/reading to disk. You must always properly serialize the data despite the fact that the _order_ of the elements is defined.


I would assume you can't safely do arithmetic within a structure, for instance. As an example, take a

  struct {
    char a; 
    char b; 
    char c;
    char *p;
    char d;
  }
A valgrind error that points at "0 bytes within an area of memory..." would signal an error originating somewhere in the struct, but who knows whether it's p or a.


I believe one of the reasons for adding "class" types in C++ was to break from the C's restrictions on the struct layout. But even then the trouble of breaking binary compatibility was too much to pay for freeing programmers from minding their own data.


No, the same restrictions apply to class as to struct. The most relevant difference in C++ is that the compiler can reorder data members that are in different visibility specifications (public, protected, private) but not within one of those blocks, to maintain compatibility with C.


Well you only have guaranteed compatibility with C if a class is standard-layout, which requires that all members have the same access control (public, protected, private). So if there is any rule in C++ that mandates ordering within a "block" when there are multiple such "blocks", it's not for compatibility with C.


This is what I said. Even though they could change the rules for the classes they did not because it's far from "free".


Funnily, this is one of the freedoms granted by the standard which neither gcc, clang, or MSVC actually make use of.


Oh, putting "class" in the first sentence was a mistake as people hit downvote without reading the rest. I do not say that "class" allows rearranging members (other than C++ access-based rearrangement that applies to structs as well). Really. Please read.


this is one of those classic things where we could actually improve C for performance. the standard is restricting here - data that requires to be identical in all cases for network, file etc. purposes should be marked with a special keyword - or for legacy purposes we mark the ones we want the compiler to order for us.

its more complicated than just keeping the size down though - you might want things to start on 64 or 128 or 256-bit boundaries as well - e.g. if your class contains aligned vector members and is itself aligned.


Because if the compiler reorders your fields under your foot, you won't be able to know how things a serialized. You lose control over the memory layout.


Starting with "the i7"? What microarchitecture do you mean?


The Nehalem microarchitecture made aligned and unaligned memory accesses have the same latency and throughput, except in weird edge cases like loads which span an L1 cache line boundary.


It used to be the case that unaligned memory accesses were twice or more slower because they literally required multiple bus cycles, but for x86 that hasn't been true since the Core 2 at least: http://lemire.me/blog/archives/2012/05/31/data-alignment-for...


Thanks.


One trick that's not mentioned is unbundling the struct. Suppose you have a struct with a pointer and a character in it, and a huge array of those structs. If you resent the padding tax, refactor your code to use and pass around two arrays instead, one of pointers and the other of chars.


Just a note, this is sometimes called SOA (Structure of Arrays) or AOS -> SOA refactoring and comes up a lot when trying to take advantage of vector operations, too.


It's interesting this is making a revival in C for efficiency. When I was first learning to code ('90s), this style of "parallel arrays" was seen as a scripting-language wart. When you semantically want an array-of-struct in a bash script, you resort to parallel arrays instead, because it's what you can do easily.


> When I was first learning to code ('90s), this style of "parallel arrays" was seen as a scripting-language wart.

When I first learned to program in the 80s, this was known as "Fortran order" (as opposed to Pascal or C order), and in the later nineties, I think that "column major" and "row major" became popular, and they still are.

This goes back to sometimes in the 60's, with APL and Fortran doing column major on purpose, BASIC by accident, and most of the rest doing Row major.


> this style of "parallel arrays" was seen as a scripting-language wart

It was only a wart because they didn't support compound types at all (and in some cases they did support compound types but people were not taught them when moving to a better language and kept their old techniques). It is compounded by the fact most scripting languages are untyped, so structure packing (the main justification for using arrays like that in more modern languages (for general memory use, stack size, or cache-hit optimisations)) isn't nearly as relevant.

Basically if you see parallel arrays used without a comment stating that it is done for packing reasons or similar, it is there either because the language does not support compound structures well or because the programmer knew no better. In a language where structure packing can be useful, it is a perfectly valid optimisation (though for small data not in tight loops, it may be an optimisation too far).


It's also worth noting that it comes at a cost of locality for an individual row, so isn't always an optimization.


In fact, in Java this is the only way to simulate structs as POD rather than paying for the overhead of an array of objects.


Sorry, POD?


plain old data


Ah, thanks.


Not the only way. You can interleave fields in a ByteBuffer.


It's also great for minimizing cache misses if you find yourself iterating over arrays of large structs and just operating on one field from each of them.


The ISPC[1] language has syntax for requesting the compiler perform this transformation for you, giving you the performance benefits for free without requiring manually writing out the code or even paying the price in syntactic clarity (you can still write out p[0].x+p[1].x instead of px[0]+px[1]).

[1] http://ispc.github.io/


Some lisp systems implemented pairs this way. A global array of CARs and a global array of CDRs


Working with unzipped structures (xs+ys instead of xys) can be a bit of a pain. It seems like a transformation that a compiler should be able to do easily, given enough information about structure and (ugh) aliasing. I wish C had real array types. :(


Happily, in Haskell the standard array library does the struct of arrays format for unboxed arrays. It's super handy!


I thought the array package was deprecated. Its mutable API is nicer than vector’s, though, if I recall correctly.


Vector does the SOA represetation too.


yup, Vector is the one i have mind. (Vector is also an array lib)


Is this a "lost art"? I always consider the layout when I'm writing a C struct. It's the principal concern that governs the correct ordering of the members.


I was wondering that myself. This doesn't seem like a "lost art" so much as a basic necessity for even moderately constrained applications. I can't remember the last time I did not consider the layout of members of my C structs (other than toy, or proof-of-concept applications).


Agreed. This is just how normal people, err, construct structs.

You are not an advanced C programmer until you have grasped it. You are not a master of C until you could have written this document yourself and can criticize it intelligently

In other words, a "hacker" is someone just like ESR. More blowhardery. I for one don't claim to be a master...


To be fair, he didn't claim that by writing this document he was a master of C, or that a master would only know this document, just that it was a piece of knowledge a master would should definitely have. Phrasing I guess.


"A master of C must have the same views of what's important and the same writing style that I do"

i.e., 'a hacker is someone who looks like me'.


That is not a charitable interpretation of what he wrote. Try, "A master of C must already know about structure packing deeply enough to teach it".


Try reading some more ESR, that is a common theme in his writing, unfortunately.


I agree that "people being uncharitable about ESR" is a common theme.


No, seriously. This is a common theme in just about everything ESR writes. For example, for some inexplicable reason, the Jargon File used to have an entry stating that hackers tend to be libertarian. This, of course, was written when ESR self-identified as a libertarian. Later on (post-9/11), he decided that he was a neocon instead. Strangely, the entry in the Jargon File was then updated to state that hackers tend to be neocons.

That's just one particularly egregious example. There are a gazillion others. Pretty much everything the guy has ever written has similar implications, where he defines a hacker as being whatever he sees himself as at the time. He really really wants to believe that there is a very specific hacker subculture (in his words, "our tribe"), and that they have a shared culture, heritage, and beliefs about things, and that they have some sort of meritocracy where certain members of that subculture are universally agreed upon as being wise and correct about everything (in his words, "the elders of our tribe"), and of course, that he is one of those people at the top of the imaginary meritocracy in this imaginary subculture.

Also, for bonus douchebaggery points, as if he needed them: http://esr.ibiblio.org/?p=208


> For example, for some inexplicable reason, the Jargon File used to have an entry stating that hackers tend to be libertarian. This, of course, was written when ESR self-identified as a libertarian. Later on (post-9/11), he decided that he was a neocon instead. Strangely, the entry in the Jargon File was then updated to state that hackers tend to be neocons.

The current entry:

> Formerly vaguely liberal-moderate, more recently moderate-to-neoconservative (hackers too were affected by the collapse of socialism). There is a strong libertarian contingent which rejects conventional left-right politics entirely. The only safe generalization is that hackers tend to be rather anti-authoritarian; thus, both paleoconservatism and ‘hard’ leftism are rare. Hackers are far more likely than most non-hackers to either (a) be aggressively apolitical or (b) entertain peculiar or idiosyncratic political ideas and actually try to live by them day-to-day.

archive.org says this was the same in 2003. (Possibly it changed and changed back, but I assume not.)

From march 2000 (v4.2.2 at http://jargon-file.org/archive/ which I selected somewhat at random, I didn't try to find the earliest "politics" entry):

> Vaguely liberal-moderate, except for the strong libertarian contingent which rejects conventional left-right politics entirely. The only safe generalization is that hackers tend to be rather anti-authoritarian; thus, both conventional conservatism and `hard' leftism are rare. Hackers are far more likely than most non-hackers to either (a) be aggressively apolitical or (b) entertain peculiar or idiosyncratic political ideas and actually try to live by them day-to-day.

ESR in 2008 ( http://esr.ibiblio.org/?p=301 ):

> I am not and have never been a conservative. Much less a “neocon”, whatever that means.

"Hackers tend to be libertarian" and "hackers tend to be neocons" are not sentiments expressed by either variant. "He decided that he was a neocon" just seems to be plain false. Whatever the merits of the changes he made, I claim that you are being uncharitable towards him.

I'm not necessarily defending ESR himself, I just think that you're attacking someone who isn't ESR and calling them ESR. And I think this is a common theme when people talk about ESR.


Before 2001 it was "vaguely liberal-moderate... strong libertarian contingent... anti-authoritarian", when ESR identified as a libertarian.

After 2001 it was "formerly liberal-moderate... moderate-to-neoconservative", when ESR had gone full-on warblogger. Around the same time he also added heavily politically biased definitions for terms like "fisking" and "idiotarian" to the Jargon File.

He claimed not to be a neocon in 2008 when "neocon" had become a dirty word. I don't think that changes the fact that he was a fervent supporter of the "War on Terrorism", a neocon project.

It sounds to me like you're agreeing with the parent -- he changed the description of a "hacker's" politics to fit whatever his particular political stance was at the time.


http://en.wikipedia.org/wiki/Fisking points out that the style of rebuttal had been common on Usenet for years, there just wasn't a word for it until 2001. I don't think most people are familiar enough with Fisk to give it any political connotations.

I agree he should stop trying to make "idiotarian" happen and it's completely out of place in the File.


From the ESR Jargon File: "Named after Robert Fisk, a British journalist who was a frequent (and deserving) early target of such treatment." He was "deserving" because warbloggers like ESR hated him.


> After 2001 it was "formerly liberal-moderate... moderate-to-neoconservative"

I'm pretty sure ESR identified as a libertarian when he made this change. Why did you delete the "... strong libertarian contingent... anti-authoritarian" from this bit?

> I don't think that changes the fact that he was a fervent supporter of the "War on Terrorism", a neocon project.

The parent said he identified as a neocon. Admittedly, I don't know why he would use the word in 2001 if he didn't know what it meant in 2008. But I don't think it's true. I don't think there is any time (within relevant history, probably ever) when ESR thought he was a neoconservative, even if other people would call him one.

> he changed the description of a "hacker's" politics to fit whatever his particular political stance was at the time.

I'm not saying he didn't do this, but I think it's much less obvious that he did this than the parent (and yourself, to a lesser extent) made out. The parent said that when he identified as a libertarian, the jargon file said that hackers tended to be libertarian (which it didn't); and that when he identified as a neoconservative (which he never has) it changed to say that hackers tended to be neoconservative (which it doesn't).

If ESR is actually doing bad things to the jargon file, we should be able to point at them instead of telling falsehoods. Maybe the "liberal-moderate" to "moderate-neoconservative" change was in fact a bad one. I do consider it plausible that it was a politically-motivated change with no basis in reality. But if so, we should say things like "I don't think there's any evidence that hacker politics actually changed in the relevant time period", or better yet, "I have evidence that they didn't". Instead we have things like "ESR decided he was a neocon and updated the jargon file to say hackers tend to be neocons", which is a great soundbite if you want to make him look bad, but it's not true.

(I'm not defending "fisk" and "idiotarian". But the parent didn't mention them. The parent made one specific accusation about ESR, which was not true, and I pointed out that it was not true. The fact that what the parent said, and what actually happened, can be interpreted in the same light, does not mean that I agree with the parent.)


I deleted the "strong contingent" part because it didn't change. When ESR made that change, he had very visibly aligned himself with the neoconservative war movement. It doesn't matter if he referred to himself as a neocon (the parent never said that), simply that he backed the neoconservative project.

I don't think ESR's evaluation of what a "hacker" is means anything, so I don't think there's any measurable way to say whether "hacker politics" changed. That's the point: ESR identifies a "hacker" as someone like himself, and that definition changes with him.


So I'd like to step back, because these arguments have a tendency to start to miss the point.

My general claim here is "people treat ESR uncharitably". In this thread, that started with people choosing an unfavourable interpretation of you are not a master of C until.... When I pointed out that this was uncharitable, people said "this is a common theme in ESR's writing". I think that what they meant was something like "it's okay to take the uncharitable interpretation, because ESR often writes things similar to the uncharitable interpretation, so it's probably correct".

mwfunk gave a specific example of ESR acting similar to the uncharitable interpretation. I pointed out that the example was simply not true. It made three verifiable claims, and all of them were false. I call mwfunk uncharitable for this.

The example was "ESR took action Y". You said my evidence showed that ESR actually took action Z, and Y and Z are both instances of ESR taking meta-action A (updating the jargon file to say hacker politics match his own). And if ESR has done A, then uncharity becomes more justified.

But to interpret Z as A is much easier if we accuse ESR of being a neoconservative. This, too, is uncharitable: he has said that he is not and has never been a conservative, and that conservatives are villains. He did align with neoconservatives on one issue that a lot of people fervently disagreed with neoconservatives about, but that doesn't make him a neoconservative. And if ESR isn't a neoconservative, and merely aligns with them on one issue that people care a lot about, then Z is much less strongly an instance of A.

You have leveled accusations against ESR that I think stand by themselves. But they're far less damning than we started with, and in the meantime, this thread has been filled with examples of uncharity, which is exactly what I'm objecting to.

(On a broad level, I think what we have here is uncharity piled upon uncharity. ESR says something, and people interpret it uncharitably, because the uncharitable interpretation is how ESR acts; and they know this, because of uncharitable interpretations of previous things that he's done. But those interpretations are justified because...

And perhaps this tower bottoms out with something that is not uncharitable, but every level of uncharity makes the next one less justifiable.)


Fair enough. I really appreciate hearing your views on this.

I think ESR's claims not to be a "neoconservative" stink of protesting too much. What he identifies as being at any given time doesn't mean a whole lot to me -- people can call themselves whatever they want. After seeing him dance and dissemble around his support of "race realism" for years I think it is overly charitable to take anything he says about his political categorization at face value.

If it helps, look at the other end of his change: he changed the description to say "formerly liberal" at the same time he was actively jumping into the fray as a "warblogger" and "anti-idiotarian". If he didn't change the definition to explicitly include his own politics (I believe he did) at the very least he changed it to exclude his political opponents.

All of the changes we've talked about here follow one pattern: altering work under his control to say that a "hacker" (which ESR believes to be a title of honor) is someone who resembles him more and resembles people he dislikes less. His commentary in the linked article follows the same pattern of self-aggrandizement by identification with an idealized and lionized hacker.

This is a lot more than I had hoped to ever write about ESR. All that said, if you told me I had to have dinner with him, Linus, or RMS, I'd choose ESR in a heartbeat. He also is a pretty good writer and evangelist and seems to be an effective organizer.


Not just 'hacker', but 'human being': http://esr.ibiblio.org/?p=5001


That is not a fair characterisation of the comment. For one thing, it says "could have", not "would have" - this strongly implies that it's a reference to the technical content, and not the existence or style of the report.


Elapsed time until someone used this article as an excuse to complain about the author, derailing the thread into non-struct-related flaming: about two hours.

That's longer than I was expecting! I think 2014 is going to be a good year!


Yup, ESR is the embodiment of the American exceptionalism, white, male, heterosexual privilege resident in the hacker community, and the sooner that we move beyond his worldview (while still observing he is technically astute) the better.


> while still observing he is technically astute

Is he? I'm not trying to be insulting, but I honestly can't think of anything that he's done that I might be impressed by.


I strongly suspect this depends on how one comes across C. In my experience, people who learn C as a first language seem to develop a much deeper intuition for these subtleties than, say, someone who has cut their chops in higher-level languages and only then progresses (regresses? :)) to C.


I think this is a pretty good assessment. I learned C first, so a lot of this stuff comes naturally to me. On the other hand, a friend of mine learned Java first. Now he's in a Master's program and doing C, and he's constantly asking me about low level stuff like byte alignment.


His definition of "lost art" can be summed up as "can't be Googled". I think that is fair, at least when it come to programming.


For a guy who thinks that x86 is a modern architecture and all architectures that can only do aligned memory access are "older"? I guess it is.


If you go back to 1993, SPARC, PA-RISC, MIPS, and Alpha all seemed like serious contenders for the next big CPU architecture. To my knowledge, none of them supported unaligned access (though I'm sure someone will point out a counterexample). 20 years later, all of them are dead and x86_64 and ARM are the new "modern architectures." x86_64 supports unaligned access and ARM has been gradually gaining support for it.

So yes, chips that can only do aligned memory access do tend to be older.


If you go back to 1992 you will find PowerPC which is still around. Same as MIPS.

Listen, I know that there are people who worked into retirement without ever seeing anything other than x86 but a) this shows how old x86 is and b) does not give them expertise over different ISAs.


I guess MIPS and PowerPC are still around somewhere, but it's definitely a lot less common to see them these days. For example, it used to be that every Mac came with a PowerPC, and so people were aware of the issues that chip had. Nowadays it's rare to find a programmer experienced with anything but x86. That was really Eric's point when he made that comment.


It's less common because they're hidden inside other equipment. E.g. my RAID cards have PPC cpu's; my set-top box used to be MIPS. They're both licensing hundreds of millions of cores a year.


To nitpick about all of them being dead:

MIPS is still in the top 4 or so CPU architectures by volume, with x86, PPC, and ARM, with ARM as by far the largest (forecast about 3 billion cores licensed last year).

At least a few hundred million MIPS cores are shipped every year. Depending on what you think embedded x86 volume is, it may be ahead of x86. Of course by revenue x86 still beats everyone.


Intel actually put in an feature to trap on unaligned access starting with the 486 (and it still exists today) - look up the AC "alignment check" flag. Of course almost no one used that feature since x86 supported unaligned access from the very beginning, and I think this flexibility is one of the reasons why it's stayed competitive: lots of features initially, not all of them optimal (e.g. division instruction on the 8088 was extremely slow), then improve on them with each new microarchitecture, the result being that existing software benefits from these improvements. In contrast, an architecture that didn't support unaligned access would not do much for existing software if faster instructions for unaligned access were introduced since they wouldn't be used.


MIPS ain't dead, I'm working on a MIPS device right now :)


> x86 is a modern architecture

It's a modern ISA, implemented by modern microarchitectures.


I can read and write C, but I didn't know much about this, and I was only vaguely aware that bitfields exist.


It's 2014, these days I'm gonna order struct members so members that logically belong together are adjacent, and spaced apart from other members!


This could be partially automated with `__attribute__((__packed__))` and a bit of -fipa-struct-reorg for better cache performance. Sadly, there's no any kind of `__reorder_yes_i_know_and_i_want_to_violate_c_standard__` attribute. But I really believe managing and optimizing memory layout (unless explicitly necessary, like when declaring serialization formats) should be compiler's job, not human's.


Adding just the __packed__ attribute and not also an __aligned__ attribute is not necessarily a good idea -- if a struct is marked __packed__ gcc assumes it might be at any alignment, so on architectures which don't permit unaligned loads it may have to generate a lot of byte loads and shifts to do simple reads of integer members.


Fwiw, -fipa-struct-reorg was removed in GCC 4.8.x. From the release notes:

The struct reorg and matrix reorg optimizations (command-line options -fipa-struct-reorg and -fipa-matrix-reorg) have been removed. They did not always work correctly, nor did they work with link-time optimization (LTO), hence were only applicable to programs consisting of a single translation unit.


I'm not sure letting the compiler go wild would be such a great idea: one of the strengths of C is predictable performance, which would be hard to obtain if the compiler is allowed to e.g. move data across cache lines.


If you're running on one microarchitecture, I agree. But if you're running on more than one, manual structure packing may actually give more unpredictable performance than letting GCC handle it. At least GCC will make an effort to optimize for each one and hopefully avoid things that are absolutely terrible to do on that arch (orders-of-magnitude performance loss type stuff), while a manually chosen packing optimized for one microarch can be hugely pessimal on another one.

That's a guess though, no numbers. :)


He should mention this tool:

http://linux.die.net/man/1/pahole


You beat me to it. =) Here is an introductory article http://lwn.net/Articles/335942/


gcc -Wpadded is very useful, too (its chief downside is that you don't usually want to combine it with -Werror - padding is not always bad.)


You can always add -Wno-error=padded to disable -Werror just for -Wpadded


I seem to have missed that trick. Thanks!


You could always pad manually to clear the warnings.


If your struct covers more than one cache line, you may want to think about which members are accessed together and put those in the same cache line. E.g. if you manage to fit all frequently accessed members in the first cache line, you will bring likely useful data into the cache when you access any of them. At the same time you avoid cache pollution with not so useful data from other cache lines.


For people like myself who need a short reminder what a "cache line" is:

* http://en.wikipedia.org/wiki/CPU_cache#Cache_entries

* http://stackoverflow.com/questions/14707803/line-size-of-l1-...


Classic use case: creating a packed structures to directly read TCP packets. There is still the "convert from big to little endian" problem, but I think you can use GCC's method for individually packing a struct:

  struct TCP_Packet {
    uint16_t source_port;
    uint16_t dest_port;
    uint32_t seq_no;
    uint16_t flags;
    uint16_t window_size;
    uint16_t checksum;
    unit16_t urgent;
    ...
  }  __attribute__ ((packed));


many packet/wire formats are conceived as structs when they are designed


Note that the key takeaway is just to order struct members by size, largest first; this isn't always sufficient, but it's a good habit wherever performance could matter.


Yeah. It's an incredibly verbose document, with that key insight buried in a mere two sentences near the end:

> The simplest way to eliminate slop is to reorder the structure members by decreasing alignment. That is: make all the pointer-aligned subfields come first, because on a 64-bit machine they will be 8 bytes. Then the 4-byte ints; then the 2-byte shorts; then the character fields.


Just a feature idea: I'd love an IDE plugin (for something like visual studio or eclipse) that would show this sort of information. Sort of like a disassembly, but for structs instead. I can imagine lots of industries where they'd pay for that sort of thing (IE, game programmers or OS developers, etc.)


It's not exactly what you're looking for, but clang can give you record dumps of C and C++ structs and classes. (In general, clang has lots of interesting tooling for C and C++, and I highly recommend looking into using it).

I wrote here [1] about using it in a KDE program (KStars) to cut 10% off the memory usage. In particular you can also use this tool to see some of the many bad effects of deeply nested classes: your class may be 8-byte aligned, so a 'char' at the end can take up eight bytes with padding, and it's not possible (AFAIK) to rearrange fields up and down inheritance heirarchies [2].

[1]: http://www.hdevalence.ca/blog/2013-12-22-better-living-throu...

[2]: C++ is not my favorite programming language.


Embedded IDEs already do this (showing the offsets of elements in a structure in a column when you are viwing structs).


Do you have some screenshot examples? I like the idea of this feature.


Neat, what IDE?


__attribute__((packed)) in GCC is probably worth mentioning if you only want to apply it to selected structs.


#pragma pack(push)

#pragma pack(1)

MSVC


GCC also supports that pragma, for compatibility with MSVC:

http://gcc.gnu.org/onlinedocs/gcc/Structure-Packing-Pragmas....

Clang has this, too.


And MSVC doesn't support GCC's pragma, so for optimal portability #pragma pack(1) is my preferred way of doing it.


#pragma pack(push, 1) // :)


small typo, under 4 it says a 54-bit machine.

small remark: this will cover dates to 2050

that's a bit of a gamble, not sure I would have done that myself. It's probably ok but you never know in 40 years CVS and your code is still in use somewhere and you'll cause some people headaches :]


It's especially weird as an optimization when an unsigned int would last until 2118. (Or 2106 if the weird base-changing was removed)


excellent point.


In this particular case, it's not that big of a deal. As I understand it, this is an in-memory format that does not outlive program execution. One can safely assume that, even if CVS still exists in 40 years, no one will be using a 40-year-old version of this program by then.


One could also error out in this case; which is better than assuming!


As I understand it, he needed the compression to make the code work at all in the available memory. Since this is a source code distribution, presumably it will be pretty easy to fix in thirty years if the code is still being used. (Presumably we'll all have at least 256GB RAM available in 2043...)


Interesting timing, I've been playing around with Cortex M chips and a lot of the demo code has structures that are almost pathologically mis-aligned things like

   struct SomeCamelCaseNonsenseStructure {
       uint8_t flag;
       uint32_t pointer;
       uint8_t another_flag;
       uint32_t another_pointer;
       ...
   }
I feel like flinching every time I read it.


For those of you playing along at home, Rust currently manually interoperates with C structures. This[1] is one of the struct stat structures:

                pub struct stat {
                    st_dev: dev_t,
                    __pad1: c_short,
                    st_ino: ino_t,
                    st_mode: mode_t,
                    st_nlink: nlink_t,
                    st_uid: uid_t,
                    st_gid: gid_t,
                    st_rdev: dev_t,
                    __pad2: c_short,
                    st_size: off_t,
Note the __pad elements.

[1] https://github.com/mozilla/rust/blob/master/src/libstd/libc....


> If the compiler happened to map [the first member of a struct] c to the last byte of a machine word, the next byte (the first of p) would be the first byte of the next one and properly pointer-aligned.

No, a compiler can't do this. You have to assume that the first member of a struct will be machine word aligned. I'm sure there are many reasons, but the one I can think of now is that structs can be dynamically allocated. That means it has to be possible to take the return value of malloc() and assign it to a pointer, and there is no way to get malloc() to return a memory block with that wired alignment (starting on the last byte of a machine word).


Coming from an 8-bit assembly background, I've always found this alignment size/speed tradeoff to be a little vexing (either it's slower, or we're wasting memory), and thought "couldn't the hardware be better?" so finding out that x86 has evolved to the point where unaligned accesses basically have no penalty was really pleasing. No more bytes wasted, so we can have small and fast. (How do they do this? By making the memory bus a lot wider than a word, among other things.) IMHO ARM is just starting to see the value of this and catching up, if only the other architectures would do the same...


Is there an easy way to get the compiler to show what padding it's using for a structure? (other than filling the structure with marker values then dumping the data and picking through it with a hex editor)


The pahole(1) utility, mentioned above.


>It’s still 24 bytes because c cannot back into the inner struct’s trailing padding. To collect that gain you would need to redesign your date structures.

Well that's disappointing. Is there any [nonstandard] way to get a compiler to take advantage of that space?


I tend to only use packed structures when I need to match a binary format or know I'll create huge arrays of structures.

For the rest I blindly trust the compiler to choose optimal boundaries and alignments.


Problem with that approach is that compiler cannot do any optimalizations of struct field layout as it's completely specified by platform ABI and order in which are fields declared.


The compiler can't rearrange your structure data for optimal packing, because it doesn't know if you rely on the ordering of it. I agree that structure packing is mostly unnecessary optimization except for a few high performance scenarios where cache miss frequency has a huge impact.


This is still very practical when considering disk layouts of the structures. You want something compact enough on disk, but when it comes into memory, you still benefit from aligned access.


There are still statically typed languages used today like ada. although even within Ada, you'll often see programmers package spare buffers into structures to increase flexibility.


This is very useful when programming microcontrollers. Especially if you create many structs in an array. Many parallel state machines or something like that.


this is indeed somewhat lost. i learned it from coding standards at one place a long time ago... but since then I've had to explain it x number of times to juniors to the point where I wrote an article to point them at instead:

http://jheriko-rtw.blogspot.co.uk/2011/02/know-your-cc-struc...


I recently had to do this (padding) as I was using memcpy to load a struct from reading a binary file for a personal project.


Why does "sizeof(struct foo5))" in packtest.c returns 8 instead of 6?


This is the kind of stuff they teach you in college with a CS degree.


It really shouldn't be. Struct packing is a purely technical skill, not really computer science at all; there's no general insight you gain from understanding it, and it's easy to look up and teach yourself if you need to learn it later. (Not that most programmers will ever need to do it; these days even many embedded systems have enough memory and cpu performance that this kind of microoptimization is a poor use of programmer time)


Not universally. None of my university's Java based curriculum or my brother's university's Python based curriculum mentioned this. My community college's C++ based curriculum mentioned it in passing.

I seem to remember it being mentioned in Bruce Eckel's excellent "Thinking in C++".

That said, I'm kinda curious how much this affects Objective C objects.


Never mentioned once in my curriculum at all...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: