
Building a C Compiler Type System – Part 1: The Formidable Declarator - robertelder
http://blog.robertelder.org/building-a-c-compiler-type-system-the-formidable-declarator/
======
bluetomcat
The key issue with C declarations is that "declarations mirror use" which
means that you can interpret them this way:

    
    
        int *p; // dereferencing "p" gets you an "int"
        int **q; // dereferencing "q" twice gets you an "int"
        char a[10]; // subscripting "a" gets you a "char"
        void (*f)(void); // calling f() gives you "void"
    

People who write the asterisk next to the type often have a hard time
understanding this.

~~~
TickleSteve
absolutely wrong.

the 'type' is pointer-to-int, the 'name' is 'p', hence 'int* p'.

I understand pointers quite well, thankyou....

~~~
ScottBurson
> the 'type' is pointer-to-int, the 'name' is 'p', hence 'int* p'.

It would be great if the language actually worked that way, because that's the
natural way to think about it. Unfortunately, it doesn't. In the simple case
of a single pointer variable, we can put the star next to the type and pretend
that C works the way we think, but it's a fiction.

~~~
Gankro
I don't see how anything they said is incorrect. The type is int*, the name is
p. C just has this weirdly optimized multiple value decl syntax that for some
reason really wants to make it easy to declare an array of int, pointer to
int, and function to int all at once instead of making it easy to declare
multiple pointers to int.

It seems to be a perfectly rational reaction to say "oh this was a stupid
idea, let's not take advantage of it". (which is honestly a great way to
approach a lot of C's features)

~~~
jotux
C's comma operator syntax is also weird and, in my experience, many people
don't really understand it that well.

edit: In reference to the last line: "oh this was a stupid idea, let's not
take advantage of it".

It's a weird feature that generally makes the code more difficult to
understand but people still use it a lot, similar to multiple declarations on
a single line with pointers mixed in.

~~~
penguinduck
Maybe you don't understand it that well because the described usage of a comma
is not a comma operator (or any kind of operator).

~~~
imtringued
[https://en.wikipedia.org/wiki/Comma_operator](https://en.wikipedia.org/wiki/Comma_operator)

~~~
penguinduck
Your point? The multiple declaration syntax which is being discussed here uses
comma _separators_ , not operators.

------
akkartik
Avoiding this hairy problem was why I went[1] with a more verbose and more
alien-looking -- but much more regular -- s-expression syntax for what is
still exactly a C type system:

    
    
      number
      (address number)
      (address address number)
      (address array character)
      (map (address array character) (list number))
      (function number -> number)
    

etc. For simple types you can replace brackets with colons:

    
    
      address:array:character
    

But you have full expressiveness if you need it.

[1] [https://github.com/akkartik/mu](https://github.com/akkartik/mu)

~~~
bluetomcat
IMHO, C's type system is quite orthogonal but is let down by the confusing
declarator syntax. I designed my own experimental statically-typed language
which basically inherits the C type system, but uses a "human-oriented"
declaration syntax:

    
    
        a: int; // int a
        p: ptr(int); // int *p
        q: ptr(ptr(long)); // long **q
        arr: int[20]; // int arr[20]
        arrp: ptr[20](int); // int *arrp[20]
        parr: ptr(int[20]); // int (*parr)[20]
        fp: fptr(a: int, b: int): long; // long (*fp)(int a, int b)
    

[https://github.com/bbu/quaint-lang](https://github.com/bbu/quaint-lang)

~~~
akkartik
Ooh, Quaint looks _very_ interesting. I think there's a lot of overlap with my
Mu project, beyond these minor syntactic decisions. Lots of room for compare
and contrast, and for sharing and stealing ideas :)

------
userbinator
It's worth noting that two of the example programs in K&R are a simple parser
and "unparser" for (a subset of, but still quite complete) the declaration
syntax, and it is only a few dozen lines.

IMHO the biggest difficulty that beginners face with the syntax is entirely
because they attempt to parse it left-to-right and aren't following the
precedence; once you realise that the operators (), [], and * in declarations
have the exact same precedence they do in the rest of the language, and that
expressions like 2 * (3 % foo(i + 4 / x[j])) - 1 are not read left-to-right
either, it all comes together and makes perfect sense.

Starting at the identifier (or where it would go, if it was an abstract
declarator) and reading outwards following the precedence rules (and
recursively applying this to function calls) is the only correct way to parse
these declarations, and it is basically what the example program in K&R
illustrate.

Thus the "clockwise spiral rule" mentioned in the post is applicable only to
certain cases and incorrect in general, as Linus Torvalds explains:
[https://plus.google.com/+gregkroahhartman/posts/1ZhdNwbjcYF](https://plus.google.com/+gregkroahhartman/posts/1ZhdNwbjcYF)

Edit: upon pondering the example

    
    
        int f((((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))));
    

I do not think it is legal in C89/90/99/11, since a parameter-declaration must
begin with declaration-specifiers, and the declaration-specifiers cannot begin
with an opening parenthesis. (Go to [http://www.quut.com/c/ANSI-C-
grammar-y.html](http://www.quut.com/c/ANSI-C-grammar-y.html) and start
following the rules via
declaration->init_declarator_list->init_declarator->declarator->direct_declarator->parameter_type_list->parameter_list->parameter_declaration.)

On the other hand, I believe this:

    
    
        int f(int(((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))));
    

is legal and the type of the parameter is (pointer to) function returning int,
with plenty of redundant parentheses around the abstract declarator.

~~~
robertelder
Thanks for that comment, I'll read over that thread to see if I missed
anything. Upon closer reading of the spiral rule, I don't think I have
actually been following it as closely as I thought, so I don't think it
affects the correctness of any other parts of the article. All this time, I
had been kind of thinking that the spiral rule was a sort of 'gold standard'
way to remember how to read declarations, but it doesn't clearly explain how
to parse stuff like

    
    
       a[1][2][3][4];

------
bla2
Fun fact: `typedef int I;` and `int typedef I;` are both valid and do the same
thing.

~~~
gsg
And if you want to go weirder, implicit int means that you can leave out the
`int`:

    
    
        typedef x;
    

Even "better", C declarators can be empty in order to allow for struct/enum
declarations that do not list any variables. So you can leave out the
variable, add qualifiers, etc:

    
    
        typedef;
        const typedef;
        typedef const;
    

It's a really fun syntax in some ways.

