

How Structs Really Work in C - pprov
http://tenaciousc.com/?p=3184

======
mansr
People like the author of this should be banned from the Internet. Almost
everything in that post is wrong or misleading.

 _1) In C, a struct maps to memory_

A struct maps to memory, sure, but not in a defined way. For popular machine
architectures, there is usually an ABI standard which specifies the mapping.
For less popular ones, each compiler does whatever it wants.

It is correct that a struct representation may include padding, but the author
makes several mistakes explaining it. Firstly, there may never be padding at
the start of a struct, which is the _only_ rule dictated by the C standard.
Secondly, he fails to mention _why_ padding is sometimes used (to achieve
natural alignment for all members). Finally, he suggests using implementation-
specific pragmas or flags to pack the struct without padding.

Apart from padding, there is a further problem with this approach which the
author fails to mention: byte order. If the data file was written with a
different byte order than the machine executing this, all multi-byte values
will come out wrong.

As another commenter points out, the correct solution to all of the above is
to perform explicit marshalling between the struct and a serialised format.

 _2) In C, structs are allowed to "run off the end" [...] As long as a symbol
is backed by real memory, you can do what you want with it -- including
running it past its boundaries_

Nothing could be more wrong. C very explicitly disallows accessing an address
outside a declared object, array, or dynamically allocated block (malloc).
Even computing such an address is forbidden. The errors resulting from
breaking these rules are often very hard to pinpoint.

The zero-length array suggested is also in violation of the standard, which
requires that arrays have a positive size. In C99, structs are allowed to end
with a "flexible array member", which is an array with no declared size at
all. This array can then be accessed as though it has as many elements as will
fit before the end of the containing object or dynamically allocated block.

Declaring a 1-element array and accessing beyond the end of it is invalid even
if the resulting address is otherwise within the containing object. Violating
this will, again, lead to subtle errors which are hard to find.

 _3) In C, you can compute an offset within a struct [...]_

While the null pointer casting suggested here usually works, offsetof() is the
preferred method, and the author even mentions this in a footnote. An
important distinction not mentioned is that offsetof() must expand to an
integer constant expression, which the address-of expression is not. This
means that only the former may be used where an integer constant expression is
required, such as (static) array sizes and case labels (using a struct member
offset for either of those seems rather unlikely, of course).

------
burgerbrain
Meta comment:

 _"You don't know it yet, but you're...wrong."_

I really _really_ hate it when a posting starts off like this, then proceeds
to tell me something _anyone_ who considered themselves a C programmer would
know.

Want to write a post about something kind of neat that people outside the
field might not be up on? Cool, have fun. But don't start it out by being a
rude ass. And this guy dedicates the first hundred words or so of his article
to it.

------
Todd
Most seasoned C programmers understand every point in this post. Points 2 and
3 are indeed clever tricks, but they're tricks that you eventually have to
employ if you're doing low level work. It's part of the beauty and allure of
C.

------
keeperofdakeys
_Many compilers insert padding bytes into the struct to ensure the fields are
byte-aligned. What's worse, they often do this by default._

The way this guy explains it, he makes it sound _optional_ , when it isn't.
You can't use misaligned addresses on pretty much every piece of hardware.
Compilers make a choice between time and effort, it is much easier to just pad
a struct then try to juggle elements around until it takes up the least space.
The onus is on the programmer to order their struts properly if they really
care heavily about memory.

------
p9idf
"memcpy(&v, video_loc_in_memory, sizeof(struct video_file));

"Well, here's the problem: if padding bytes are involved, you could be
entering a world of pain and not even know it. [...] The solution is to use a
compiler switch or a #pragma pack to ensure that structs are packed (i.e., no
padding applied)."

That's an implementation-defined hack. The correct solution is to explicitly
marshal your data between its in-memory format and the required external
representation.

------
javadyan
Any decent C programmer should know this stuff.

Also,

> size_t offset = (size_t) &(((struct s*)0)->thing);

In what universe is that code "ugly"?

------
caf
Padding is not allowed at the start of a struct, and some architectures simply
can't access misaligned objects.

------
rhdoenges
Wouldn't the zero-length array potentially run into other memory?

~~~
notaddicted
It is only fit to be used for heap allocation, you allocate the size that you
need (i.e. malloc( sizeof(struct video_file2) + ARRAY_SIZE) ). It wouldn't
work as a local variable.

edit: I suppose one could use for globals or on the stack by casting a byte
array ... I would find that to be unspeakably ugly but I don't write much C.

~~~
rhdoenges
ah, that's clever.

------
fleitz
Packing a struct can reduce performance due to misaligned reads. If the
compiler decides your struct needs padding it probably does.

