
Zero size arrays in C - ashitlerferad
http://www.labbott.name/blog/2016/05/10/zero-size-arrays-in-c
======
kazinator
A zero sized array declaration was a constraint violation (this requiring a
diagnostic) in ISO C 90 in all contexts: even as a function parameter, where
the identifier being declared is actually a pointer and not an array at all!

C99 added the "flexible array member" feature: the last element of a struct
can be an array of size zero: "As a special case, the last element of a
structure with more than one named member may have an incomplete array type;
this is called a _flexible array member_." [C99, 6.7.2.1 ¶16]

In C90 code, the "struct hack" is implemented using an array of size [1] at
the end of a struct. Some compilers allowed zero (like GNU C, I think) before
C99.

In any case, supporting a [0]-sized array member which is not the last member
of a struct, without issuing a diagnostic, is non-conforming. The example
given in the article with consecutive [0] arrays requires a diagnostic.

By the way, the (apparently only?) advantage of the flexible array member is
that we can use sizeof (type) to obtain the size of the structure just
excluding the first element of the array. Whereas with the C90 style struct
hack, we must use offsetof(type, last_member) so that we exclude [1] from the
calculation:

    
    
      struct foo {
        /* ... */
        int array[ZERO_OR_ONE];
      };
    
      /* Correct in C99, if ZERO_OR_ONE is 0
         Incorrect in C90, (ZERO_OR_ONE can't be zero).  */
      size_t foo_plus_3_elems = sizeof (struct foo) + 3 * sizeof(int);
    
      /* Correct in C99 whether or not ZERO_OR_ONE is 0 or 1.
         Correct in C90 with ZERO_OR_ONE being 1. */
      size_t foo_plus_3_elems = offsetof (struct foo, array) + 3 * sizeof(int);
    

Here, "incorrect" means we calculate slightly more storage than needed,
usually without any downside.

"Correct in C90" means _de facto_ correct, not that it was well-defined
behavior. "Everyone" was doing it.

~~~
comex
I think you made a small mistake: an "incomplete array type", as allowed by
the clause you cited, is one with empty brackets (int foo[]). Per 6.7.5.2.4:

> If the size is not present, the array type is an incomplete type.

This is different from specifying the size as 0, which is still not allowed,
per 6.7.5.2.1:

> In addition to optional type qualifiers and the keyword static, the [ and ]
> may delimit an expression or *. If they delimit an expression (which
> specifies the size of an array), the expression shall have an integer type.
> If the expression is a constant expression, it shall have a value greater
> than zero.

Oh, and for the record, while you may be alluding to this in your last
paragraph: another advantage of flexible array members is that indexing them
is actually well-defined per the spec (if enough space has been allocated, of
course), while AFAIK the spec has never added a special case for length-1
arrays, so accessing the "extra elements" is technically undefined behavior.
However, common compilers like GCC do treat length 1 specially, as an
exception to optimizations that generally assume you won't index out of
bounds, so 'technically' really is 'technically' \- it's not something that
some random future version of GCC is likely to break your code if you use.

~~~
kazinator
Even though I got the quote right out of C99, I still mixed it up.

The reason for this is that ... I don't use flexible array member myself _or_
the [0] GCC extension, either.

Flexible array member doesn't seem to bring anything to the table other than
formal definedness of behavior. I can't imagine any implementation which
supports flexible-array member in its C99+ modes, which makes [1] fail in the
same modes or in the C99 mode. In any old compiler that doesn't have C99
support, that formally undefined struct hack is all you have.

I don't mind using offsetof(type, member) to obtain the size of the header
before the array, rather than sizeof(type).

The classic hack is portable to C90 and C++98, making it good for "Clean C".

That said, flexible array member is elegant in the sense that it just uses an
incomplete array type similarly to a file scope array declaration. Using [1]
is a bit like while (1) instead of for (;;). I don't like the [1] in the
struct hack, but I don't use C for its beauty.

------
jandrese
That seemed pretty straightforward to me. A zero size array is better thought
of as a pointer to that part of the struct, and if you put two back to back
you get two pointers to the same spot. I'm not sure what that guy was
expecting to happen.

The usual reason to use them is when you have a protocol header with a
variable length data portion. Putting the zero length array in the struct
allows you easy access to the data while not changing the sizeof() the header
so you can avoid a bit of pesky pointer math and make the code easier to read.

Thinking about this some more, you could also use this as a ghetto form of
union, but I'm not sure why you would want to.

    
    
      struct
      {
        char[0] theBytes;
        int     theValue;
      }

~~~
_kst_
> A zero size array is better thought of as a pointer to that part of the
> struct, ...

An array of any size is best thought of as an array. Arrays are not pointers.
(See section 6 of the comp.lang.c FAQ,
[http://www.c-faq.com/.](http://www.c-faq.com/.))

A zero size array is best thought of as illegal (if you want portability) or
as a compiler-specific extension (if you don't mind depending on a particular
compiler, perhaps gcc).

~~~
jandrese
Technically correct, but using the zero length array trick creates clearer,
more self documenting code. It's a tradeoff between readability and
portability.

~~~
_kst_
> Technically correct,

Or, as I prefer to say, "correct".

> but using the zero length array trick creates clearer, more self documenting
> code. It's a tradeoff between readability and portability.

Clearer compared to which alternative? I find C99-style flexible array members
(defined with "[]") quite clear.

------
theseoafs
> C makes it very easy to get things subtly wrong. Yet another item to add to
> your code review checklist. Better yet, don't use zero size arrays unless
> you really have to.

You don't have to. C has flexible arrays in structs for this, and they
actually work.

~~~
TazeTSchnitzel
C99, yes. Earlier versions didn't have that.

------
Paul_S
And this is why in 2016 I'm still stuck with ANSI C. First person to tell me
ANSI C is not C89 anymore wins a prize! I love C but I can't think of any
other language that failed to gain traction as it got updates. Maybe similar
to failure of Python 3 I guess but even Python 3 is getting reasonable uptake.

~~~
_kst_
ANSI C is not C89 anymore. Where's my prize? 8-)}

The term "ANSI C" is still very commonly used to refer to the language
described by the 1989 ANSI C standard. This usage is strictly incorrect, but
too firmly entrenched to ignore.

The 1990 ISO C standard describes the same language, and was officially
adopted by ANSI, making the 1989 standard obsolete. The 1999 and 2011 ISO
standards, both also officially adopted by ANSI, each officially made all
earlier standards obsolete. If you want to refer to the language defined by
the 1989 ANSI C standard (and described in K&R2), call it "C89" or "C90".

C11 might get more traction than C99 did, since gcc defaults to C11 plus GNU
extensions starting with release 6. (But Microsoft's C compiler is still
behind the times.)

~~~
Paul_S
All the compilers I ever used in my professional career could do C99 but most
companies limit themselves to ANSI C (good luck convincing anyone that the
term is incorrect!) for "compatibility" reasons and it's got nothing to do
with Microsoft (who uses their compilers for C anyway?).

Your prize for pointing out that the usage of ANSI C is incorrect is:
disappointment in humanity.

~~~
to3m
I used to use VC++ to build C code sometimes. People write software in C, and
if you use VC++ you might want to compile it. C and C++ are not the same, and
the fixing-up process can be time-consuming and error-prone, even assuming the
code didn't go full C99 with anonymous aggregates and designated initializers.

Fortunately, these days VS2013+ do an OK job of C99.

------
adrianN
I'm reasonably sure that in C++ at least accessing an element outside the
array is always UB. Is this legal in some sufficiently modern C?

~~~
joosters
The struct would be stored in a larger area of memory that has previously been
malloc'd (to allow room for the data that you want to put after the header).
So long as accesses to the data are before the end of the malloc, it won't be
out of bounds and so won't be undefined behaviour.

~~~
to3m
Prior to its deletion for reasons unknown, there was a post that (reading
between the lines) was calling you out on the term "access", which doesn't
appear to encompass writing:
[http://port70.net/~nsz/c/c11/n1570.html#3.1](http://port70.net/~nsz/c/c11/n1570.html#3.1)

I'd class that as a valid quibble, if my interpretation is correct, but a
quibble nonetheless. The general intent is clear enough from the standard, I
think:
[http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p18](http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p18)

Further reading on the subject of undefined behaviour (apropos of nothing in
particular):
[http://port70.net/~nsz/c/c11/n1570.html#3.4.3p1](http://port70.net/~nsz/c/c11/n1570.html#3.4.3p1),
[http://robertoconcerto.blogspot.co.uk/2010/10/strict-
aliasin...](http://robertoconcerto.blogspot.co.uk/2010/10/strict-
aliasing.html), [https://groups.google.com/forum/#!msg/boring-
crypto/48qa1kWi...](https://groups.google.com/forum/#!msg/boring-
crypto/48qa1kWignU/o8GGp2K1DAAJ), [http://blog.metaobject.com/2014/04/cc-
osmartass.html](http://blog.metaobject.com/2014/04/cc-osmartass.html)

~~~
joosters
Your second link seems to agree with my original post, i.e. it is not UB. Read
on to point 20:
[http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p20](http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p20)
and it talks about my example of putting the struct in a malloc'd area of
memory with space after the struct.

------
georgeecollins
This brings back so many memories of programming in C / C++ with pointers.

Good times.

~~~
curiousgal
What are you up to now?

------
hiimnate
What is the benefit of using a zero sized array over a pointer?

~~~
halayli
you can do

x = malloc(sizeof _x + <variable_length))

x->zero_array points to the variable_length.

The advantage is that you get contiguous memory access and do a single malloc.

As opposed to:

x = malloc(sizeof _x)

x->ptr = malloc(<variable_length>);

x and ptr are not necessarily contiguous.

~~~
hiimnate
But can't you write it so that it is contiguous using a pointer?

~~~
jmpeax
You can

x = malloc(sizeof(x) + <variable_length>);

x->ptr = x + sizeof(x);

~~~
imron
Except it requires sizeof a pointer more memory and an extra assignment after
the malloc, which is why people use the zero-sized array instead (note this
comment was to explain for the grandparent why you would normally prefer a
zero-sized array instead).

------
janci
is there a difference between

struct{ int something; int someArray[]; }

and

struct{ int something; int someArray[0]; }

?

~~~
kazinator
Yes; the latter one is the "flexible array member" which is supported in ISO
C99 and later. The former is a constraint violation requiring a diagnostic.

~~~
timmaxw
Did you mean the other way around? I think someArray[] is a flexible array
member and someArray[0] is a constraint violation.

~~~
sigjuice
You are right. OP has it backwards. gcc treats someArray[0] almost the same as
someArray[].

[https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gcc/Zero-
Length.htm...](https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gcc/Zero-
Length.html#Zero-Length)

Other compilers might have similar treatment for someArray[0].

~~~
tavert
Just don't try to have both a zero sized array and a flexible array in the
same struct. g++ 5 or earlier will compile that, but g++ 6 will not:
[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69550](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69550)

------
known
Isn't the pointer a zero size array?

------
0x0
I think you dropped this: \

¯\\_(ツ)_/¯

~~~
ivanca
Some softwares just delete all backslashes instead of handling it properly, in
this case it would be converting it to an HTML entity (&#92;)

