
Strange C Syntax - Tideflat
http://blog.robertelder.org/weird-c-syntax/
======
cjslep
These strange syntaxes are perfect additions for the article "How to write
unmaintainable code", which already has a Duff's device example:

    
    
        switch(count % 8) {
            case 0: do{ putchar('0' + (int)j);
            case 7:     putchar('0' + (int)j);
            case 6:     putchar('0' + (int)j); /* Unrolled
            case 5:     putchar('0' + (int)j);  * for greater
            case 4:     putchar('0' + (int)j);  * speed.
            case 3:     putchar('0' + (int)j);  */
            case 2:     putchar('0' + (int)j);
            case 1:     putchar('0' + (int)j);
        } while(--j > 0);
    

Without syntax highlighting, a passing glance may not recognize cases 3
through 5 are commented out.

~~~
sago
Seems an odd example: any code with a multiline comment can have the same
issue. That's nothing to do with the strangeness of Duff's device. It would
work the same in a series of function calls or calculations.

And even then it is a matter of experience. It looks odd to me to have
multiline comments not in its own 'paragraph' of the code, so draws the eye
right away. Perhaps this would foil some programmers, but not more than once,
I'd have thought. In my experience, run-on compound statements are much more
common and hard to intuitively spot:

    
    
        if (foo)
            bar();
            sun();
    

Duff's device is difficult to understand from first principles, but even that
is a bad example of unmaintainable code because a) it looks like nothing
except Duff's device, you only need to see the pattern once or twice and you'd
recognise it, and at least know 'it's that weird pattern for unrolling loops',
and b) it is a performance optimisation that only belongs in code that is
profiled and needs to go that fast. As such it should be well commented to
avoid regressions by well meaning refactor-zealots. Inline assembly or heavy
intrinsics are more difficult to read than regular C too, so you only use them
when you need to. In my experience manual loop unrolling is very rarely
needed.

~~~
jcranmer
Actually Duff's device is very likely to be a performance hit most of the
time. It will have bad branch predictor behavior and it creates unstructured
code, making it harder for the compiler to reason about and optimize.

Although it should be noted that the original person to use Duff's device did
admit that he tried everything else to optimize it and only that worked, and
he never advocated that people should use it as a general optimization
technique.

~~~
sago
That's been my experience too, not just of Duff's device but in general with
trying to unroll loops. There may have been a point where it was practical
(although early in my career I didn't do the profile-first-optimise-later
thing very well, I confess). But if you think you can manually unroll a loop
in a way that beats the branch predictor and a good compiler on most modern
hardware, you're probably fooling yourself.

~~~
nwmcsween
Eh, this is still relevant, loop unrolling can still have a drastic effect if
the compiler cannot optimize.

~~~
sago
I suspect I'm just not in the group of people for whom it is relevant any
more. Way back I was optimising for console hardware when that hardware was
quite rudimentary. Now the hardware is much more sophisticated, I don't need
it.

Can you say what kind of stuff and in what kinds of situations you unroll
loops. Just out of curiosity.

------
greenyoda
_" You can typedef a function declaration... and declare function
prototypes..."_

This is actually a very useful technique that I use all the time. It allows
you to make sure that a function and any function pointers that point to it
always have matching types (since you only have to change the prototype in one
place - the typedef).

~~~
TwoBit
Wouldn't you get a compiler error if there was a mismatch?

~~~
Gibbon1
The potential trouble is if the function pointer goes through a void*
wormhole. (All of these types will be lost... like...)

In C you can cast a pointer, okay anything really, to a function pointer and
then call it.

    
    
      int foo(int a, int b){return a+b; }
      void *vptr = foo;
      int (*yolo)(int a) = vptr;
      int c = yolo(10); // Cthulhu come
    

This sort of thing can happen when you pass a function pointer into some sort
of callback mechanism+. When you get it back you have to cast it to the right
type and then call it. So there is a disconnect between the cast and the
function definition, unless you tie both together with a typedef.

\+ Often helpful for an api that takes a callback to also record, who the
caller was and some arbitrary bit of data. The arbitary bit of data might be a
function pointer.

~~~
jcranmer
Actually, casting between void * and void (*)() is not legal C. You can
convert between pointers of different object types, and you can convert
between pointers of different function types, but you're not allowed to
convert between a pointer of object type and pointer of function type (c.f.
N1570, §J.5.4 (which makes it explicit) and §6.3.2.3 (where it's implicit by
omission)).

This is however a very common extension allowed in most compilers, since most
processors that people encounter keep code and data in the same memory address
pool. But on DSPs, it's common for data and code memory to be distinct address
spaces, and the two pointer types aren't necessarily the same size.

------
fit2rule
I've found that a great deal of these idioms are explained in the excellent
book "Advanced C Programming: Deep C Secrets" by Peter van der Linden. Its one
of my goto books for when I want to enhance my 30 years of C-programming
experience with a little more insight - I've read it multiple times since it
was published, and always learn something new. Check it out if you want to
dive more deeply into some of these oddities:

[http://archive.arstechnica.com/etc/books/deep-c.html](http://archive.arstechnica.com/etc/books/deep-c.html)

~~~
macintux
The rare technical book that rewards the reader with not merely technical
excellence but also robust humor. I still read parts on occasion even though I
have no need for C these days.

------
m3koval
The bitfield example is misleading. Section 6.7.2.1/10 of the C99 standard
says:

"The order of allocation of bit-fields within a unit (high-order to low-order
or low-order to high-order) is implementation-defined"

There is no guarantee on the order of the bits inside a bitfield. The compiler
may also introduce padding, e.g. for alignment purposes. This makes bitfields
unusable for unpacking binary data.

Unfortunately, you're stuck with shifting and masking to replicate the same
effect.

~~~
gdwatson
Implementation-defined behavior is still defined, in this case probably by the
platform ABI. It's simply not portable.

~~~
Too
I've had very nasty experiences with bitfields even though the order was well
defined on that platform. It was an embedded system where someone got the idea
to create bitfield access to a cpu register.

The problem was that some bits in that register would be cleared automatically
each time, so even if you wrote 0 it would read back as 1 the next time. Some
other bit would always trigger an action if you wrote to it regardless if you
wrote the same value as it had before, this was called the send_bit and when
you wrote 0 it sent some stuff onto the network cable. So code that looked
perfectly fine like this:

    
    
        register.bitsa = 0x2;
        register.bitsb = 0x1;
        register.send_bit = 0.
    

Actually sent 3 packets instead of 1 because the compiler translated each
write of a bitfield member to a write of the whole register, some other bits
in bitsa could get corrupted by the bitsb-write since they didn't read back as
they were written. I assume the cpu had a minimum addressable unit of 8 bits
so the compiler had to translate each line into something like register =
(register | setbits) & clearbits; which requires reading the previous value
each time just to write a single bit, this is very easy to overlook when you
just see the three lines of code being written in a neat sequence.

~~~
restalis
"sent 3 packets instead of 1 because the compiler translated each write of a
bitfield member to a write of the whole register"

So the problem was the fact that the language gives the impression of having
more control than what the hardware is actually offering. In this case bit
access must have been better left to platform-dependent libraries (when it
would present itself as something more than what masks usage can do).

------
saurik

        *(const char * + char *)  The type of int i is converted to 'char *' and multiplied by sizeof(char)
    

I am pretty certain this explanation does not make any sense: what is really
happening here is that the int, for purposes of the addition, is measuring
units sizeof the object being pointed to; there is no meaning I know of to
adding two pointers.

    
    
        /*  This works because "Hello"[5] == 5["Hello"].*/
    

At this point, you could really just say the following:

    
    
        /*  This works because a[b] == *(a + b), and addition is commutative. */

------
skarap
This reminded me of another C syntax strangeness: "Flexible array member". It
allows you to do something like this:

struct items_with_header { int header_field1; unsigned int length; double
array[]; };

Then allocate enough memory and use the struct to access it.

Used it once in a hash-table implementation.

~~~
brandmeyer
This might be my biggest gripe with C's syntax.

typename identifier[];

means three different things, depending on where it appears: As a function
argument, it is a pointer that will be accessed like an array. As a variable
with automatic storage duration, it is an array whose size will be determined
by the right-hand-side of an assignment from a braced initializer list. In the
middle of a struct definition, it's illegal. And at the end of a struct
definition, it is a flexible array member.

~~~
seba_dos1
My favorite is similar thing, but in C++ and much more confusing:

    
    
      class Bar {};
      
      class Foo {
      public:
      	Foo(const Bar &c) { }
      	void method() { }
      };
      
      int main() {
      	Foo foo(Bar());
      	foo.method();
      	return 0;
      }
    

"foo.method();" doesn't compile because:

    
    
      error: request for member ‘method’ in ‘foo’, which is of non-class type ‘Foo(Bar (*)())’
    

That's right: "foo" is a declared function that returns instance of Foo and
takes one argument - a function that returns instance of Bar :) But when you
add additional parentheses in a line above:

    
    
      Foo foo((Bar()));
    

then "foo" is an instance of Foo created by passing a new instance of Bar to
its constructor, which is more like what you'd expect by reading the code, and
the code compiles. Fun!

~~~
BudVVeezer
Yes, the most vexing parse.

[https://en.wikipedia.org/wiki/Most_vexing_parse](https://en.wikipedia.org/wiki/Most_vexing_parse)

------
white-flame
Unions? Function pointers? Typedefs? While it might be bad for karma to point
out, intro to C certainly isn't what I expect to be news to "hackers", as per
the site's namesake.

------
HelloNurse
To consider function types, or unjustified assumptions about bitfield unions,
or use of parentheses to control nesting of arrays and pointers in declaration
"strange" one must be averse to the C language to the point of intolerance.
Backlash from working on a C compiler and wishing the task was easier?

------
andrewchambers
I had an "ohh wow" moment when I realized that the keyword typedef is a
storage class. This means it can go anywhere static can go. It just means no
variable is introduced, only a type name, otherwise it is the same syntax as
declarations.

~~~
evincarofautumn
Same. Which, for those unfamiliar, makes these declarations perfectly valid:

    
    
        size_t typedef length;
    
        struct {
          int x, y;
        } typedef foo, *pfoo;

------
drauh
Wait till you get a load of the Obfuscated C Contest

~~~
biot
One of my favorite entries:
[http://www.ioccc.org/1988/westley.c](http://www.ioccc.org/1988/westley.c)

    
    
      #define _ -F<00||--F-OO--;
      int F=00,OO=00;main(){F_OO();printf("%1.3f\n",4.*-F/OO/OO);}F_OO()
      {
                  _-_-_-_
             _-_-_-_-_-_-_-_-_
          _-_-_-_-_-_-_-_-_-_-_-_
        _-_-_-_-_-_-_-_-_-_-_-_-_-_
       _-_-_-_-_-_-_-_-_-_-_-_-_-_-_
       _-_-_-_-_-_-_-_-_-_-_-_-_-_-_
      _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
      _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
      _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
      _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
       _-_-_-_-_-_-_-_-_-_-_-_-_-_-_
       _-_-_-_-_-_-_-_-_-_-_-_-_-_-_
        _-_-_-_-_-_-_-_-_-_-_-_-_-_
          _-_-_-_-_-_-_-_-_-_-_-_
              _-_-_-_-_-_-_-_
                  _-_-_-_
      }

------
thwest
I was under the impression that C11/C99 only guaranteed that the most recently
assigned union member would have an initialized value.

~~~
brandmeyer
Strictly speaking accessing that union both ways ~~~violates the strict
aliasing rules~~~ isn't portable. However, it is such a common idiom that GCC
and other compilers explicitly allow using unions to get around the strict
aliasing rules, so long as the access is always performed through the union.

~~~
dmm
Type punning by casting through a union is allowed by c99/c11 in that the
behavior is unspecified(not undefined) and a footnote clarified the behavior
to be the expected.

[http://stackoverflow.com/questions/11639947/is-type-
punning-...](http://stackoverflow.com/questions/11639947/is-type-punning-
through-a-union-unspecified-in-c99-and-has-it-become-specified)

------
nemesisrobot
Isn't the first example undefined behavior? I always thought you shouldn't
assign data to a union using one member, then access the data using a
different member.

~~~
skarap
IIRC it was undefined behavior until C99. Then it became implementation-
defined.

------
halosghost
> All of these examples you'll see here will compile without warnings or
> errors even with very strict compiler flags in gcc and clang (gcc -Wall
> -ansi -pedantic -std=c89 main.c)

Umm, that's really not that restrictive. Use `clang -Weverything -std=c11
main.c` if you want strict warnings.

