Hacker News new | past | comments | ask | show | jobs | submit login
The Clockwise/Spiral Rule of C declarations (c-faq.com)
225 points by xvirk on Oct 23, 2016 | hide | past | web | favorite | 68 comments



This has been posted before[1], and the "spiral rule" is a load of hooey.

The correct rule is "follow the C grammar". An easier to remember and also correct rule is "start at the identifier being declared; work outwards from that point, reading right until you hit a closing parenthesis, then left until you hit the corresponding open parenthesis, then resume reading right..." (this is sometimes called the "right-left rule"[2]).

The "spiral rule" dances around the truth without actually being precise enough to be useful.

[1] https://news.ycombinator.com/item?id=5079787 [2] http://ieng9.ucsd.edu/~cs30x/rt_lt.rule.html


I used to complain about re-posts too. But now it is the first time i see a reference to this website, which seems interesting to me. So i'm happy with new url on my to-read list.

Thanks!


Way simpler: from inside out, read any subpart of the type as an expression. (Arrays have precedence over pointers, as usual.) The type that remains is that expression's type. So e.g. given the type:

    const char *foo[][50]
the following expressions have the following types:

     foo       -> const char *[][50]
     foo[0]    -> const char *  [50]
     foo[0][0] -> const char *
    *foo[0][0] -> const char
Another example:

    int (*const bar)[restrict]
    
      bar     -> int (*const)[restrict]
     *bar     -> int         [restrict]
    (*bar)[0] -> int
One more:

    int (*(*f)(int))(void)
    
        f        -> int (*(*)(int))(void)
      (*f)       -> int (*   (int))(void)
      (*f)(0)    -> int (*        )(void)
     *(*f)(0)    -> int            (void)
    (*(*f)(0))() -> int
(In this case, the last expression is more readily written as f(0)(), since pointers to functions and functions are called using the same syntax.)


What happened to the const in the second example?


The pointer itself is const; not what it points to.

    bar  -> const pointer to mutable array of ints
    *bar -> mutable array of ints
And const pointers are dereferenced with * , not (* const), so the rule needs an exception for const pointers (as well as volatile pointers).


That const applies only to the bar symbol itself, not to anything it points to. So once bar is dereferenced, the const doesn't matter. The beauty of this method is that it predicts that correctly without having to think about it.


Yeah, I think they confused

  const *
with

  * const


Nope, * const means that the identifier (i.e. thing to the right of the star) is const. That is, in this example, the symbol "bar" is const, not anything that it points to. So once you dereference it, the const no longer matters.


You're right, I got confused. const is read as "the thing to the right of me is const". const * means const pointer, * const means pointer to const.


Vice versa. Const pointer:

    int *const p;
Pointer to const value:

    const int *p
    int const *p
>const is read as "the thing to the right of me is const"

const is one of storage classes and is read at its order, not just "to the right".


Wow. I literally was staring at my test case when I wrote it and I _still_ got it wrong. I think I need to get some more sleep...


This would make a great CLI tool!


The real rule is that the type construction operators mirror the unary and postfix family of operators (declaration follows use). For instance unary * declares a pointer, mimicking the dereference operator, and postfix [] and () declare functions, mimicking array indexing and function call.

To follow the declaration you make use of the fact that postfix operators in have a higher precedence than unary, and that of course unary operators are right-associative, whereas postfix are left-associative (necessarily so, since both have to "bind" with their operand).

So given

   int ***p[3][4][5];
we follow the higher precedence, in right to left associativity: [3], [4], [5]. Then we run out of that, and follow the lower-precedence * * * in right-to-left order.

If there are parentheses present, they split this process. We go through the postfixes, and then the unaries within the parens. Then we do the same outside those parens (perhaps inside the next level of parens):

   int ****(***p[3][4][5])[6][7];
               1 2  3  4
            765
                           8  9
       1111
       3210
Start at p, follow postfixes, then unaries within parens. Then the postfixes outside the parens and remaining unaries.

The result is in fact a spiral just from going root postfix unary out postfix unary out. We just don't have to focus on the spiral aspect of it.


I used to think of the spiral rule as being a good guide, but then a commenter on HN showed me otherwise:

https://news.ycombinator.com/item?id=12053206


The real credit goes to Linus Torvalds, as I linked in that post, but I'll repeat the link again here:

https://plus.google.com/+gregkroahhartman/posts/1ZhdNwbjcYF


The rule is misleading in cases like:

    int* arr[][10];
Spiral rule would state "arr is an array of pointers to arrays of 10 ints", where actually it would be "arr is an array of array of 10 pointers to int".

Instead, when you write declarations, do it from right-to-left, e.g.:

   char const* argv[];
"argv is an array of pointers to constant characters"

It doesn't help with reading, unfortunately.


I think the advice that helped me the most was "Declaration follows usage".

  int* arr[][10];
If you index twice into arr and then dereference, you'll get an int. So arr must be an array of array of pointer to int.


Declaration follows usage is much more easier to follow than the artificial spiral rules, IMHO with a few typedefs the declaration follows usage can make things pretty simple.


To my mind that's one part of the reason the * belongs next to arr rather than next to int.

The other part is `int* x, y`.


For this reason I strongly prefer writing

    char const
rather than

    const char
Is there a reason to prefer the second version? It's a lot more popular in my experience.


It's the difference between "declare a constant integer" and "declare an integer constant" and to me the former more accurately represents what you're doing since `const` is modifying `int`, `int` isn't modifying `const`.


Putting const on the right makes more sense when you have pointers or references. Then you just always read from right to left: `int const ` is a pointer to constant integer whereas `int const` is a constant pointer to integer.

Also your argument about which modifies which is strongly anglocentric: there are plenty of people whose native language puts modifiers after the things they modify.


Contrast the first example with Golang:

  str [10]*byte
which reads exactly as it is declared: "str is an array of length 10 of pointers to byte" (byte is Go equivalent of C char (mostly)).


Or Rust:

    let string: [&u8; 10];
string is an array of references to unsigned integers of 8 bits of length ten


Actually it's a semicolon.


Whoops, fixed


It can also be simpler if you use idiomatic modern C++ with std::array<T, n> and std::function<R(T1, T2)>.


Looking at "idiomatic modern C++", I am often at a loss for words at what lengths they've gone to in order to reinvent things while greatly obfuscating them in the process. Is there a std::pointer_to<T> too? I don't know, but something like this

    std::array<std::pointer_to<byte>, 10> str;
certainly does not look any more readable to me than

    byte *str[10];
. (Disclaimer: I mainly work with C, but find some C++ features genuinely useful, although the majority of the time they seem more like absurd complexity for the sake of complexity.)


I've never seen nor heard of pointer_to ever being used to declare a pointer to something. I believe t's used inside of custom allocators for a generic type that might not use a normal pointer as the pointer type, but would never be used for normal declarations like this.

std::array is useful for letting the compiler avoid array-to-pointer decaying, value semantics, and also actually putting array length type info in a function parameter.


std::array does not exist because it is easier to read. It exists because C arrays behave strangely. Two examples: decay to pointer and no value semantics.


Or Nim:

    var str: array[10, ptr byte]
(and much richer types)

Edit: and while I'm here, Nim has other sensible syntax for this low level stuff...

    var b: byte = 10
    str[0] = addr b
    echo $str[0][]


I am not familiar with Go, and have heard many praises of its declaration syntax, but is its dereference operator postfix? That would make sense in such a case.

On the other hand, IMHO the whole "make declarations read left-to-right" idea is misguided --- plenty of other constructs exist in programming languages which simply can't be read left-to-right, but are nested according to precedence. I mean, you might as well make 3+4*3 evaluate to 21 if you want to try making everything consistently left-to-right, but I don't really see anyone complaining about not being able to understand operator precedence...


Go's defererence operator `*` is prefix, like in C.

The point here is that type declarations are regular to read, and those tend to be the tricky ones. Expressions tend not to be so difficult, and are more commonly factored if they become complex. For various reason, type declarations are not so practically factorable.


When I came to Go, I hadn't used C or C++ for over a decade, only Java and C# in between. Using explicitly written pointers came flooding back, but the new "C for expressions, Pascal for declarations" syntax still takes getting used to.

Declaring `v * T` means we can write `* v` as an expression, so the use of token * is synchronized for both these uses, but I must vocalize the * in my head differently:

  `*T` vocalizes as "pointer to something of type T"
  `*v` vocalizes as "that pointed to by variable v"
  `&v` vocalizes as "pointer to variable v"
So my thought process when I see * goes: If it's in a type, say "pointer to", otherwise say the opposite of "pointer to", i.e. "that pointed to by". It feels like an inconsistent use of * whenever I'm writing Go code -- even though I know it's a natural result of Go using Pascal-style declaration syntax but C-style tokens.


But then you lose C's nice property that declaration and use are the same syntax.

For example, D also uses a similar type syntax, so in D if you declare:

  int[10][20] x;
  x[19][9] // is legal
In C:

  int x[10][20];
  x[9][19] // is legal
I think the correct solution would have been to make pointer syntax post-fix like the arrays and functions, so that you get the best of both worlds. Go-like declarations and C-like matchup between use and declarations.


I read a paper somewhere from dennis ritchie, where he explained the development of C language, the pros and cons; and in there he mentioned that reading complex declarations is a problem in C, he said that if we had placed the * operator to the left of the type it was qualifying then it would have been easier to write and understand more complex declarations.

(PS. Golang has the right idea, since its developed by the guys who contributed to C)...


Go fixes a lot of C language design bugs, and the only cost you pay is (sometimes important) garbage collection and extreme memory layout control.


I guess thats a given with the amount of ease and fast prototyping that it provides, it had to have taken a lot of decisions beforehand for you...


What makes me wonder is why C ended up with such a syntax. That is, its contemporary, Pascal, has a very straightforward, unambiguous syntax.


The type syntax exactly matches the expression syntax used to destruct values of the type. It is very intuitive once you realize this.

The alternative would be for the type syntax to mirror the expression syntax used to construct values of the type. Functional languages tend to do this, particularly ones which prefer pattern matching over destructors.


Yeah, the rule is intuitive when I am writing the code, but I think the type declarations in other languages like Go are easier to read correctly when I am skimming through the code even though I am much more used to C. I am not sure how useful this mirroring of usage is in practice.


The C syntax is also unambiguous, and if you actually "get" C, it's what you'd naturally expect it to be.


unambiguous?

Have you ever had to write a C parser?


Yes, it is unambiguous, even if it is context-sensitive.

If you don't have the available type names then it becomes ambiguous.


Seems overly complex. The way I learned it, and now teach, is to read the type backwards (int const * is 'pointer to const int') for const correctness, but anything that requires more complex parsing by a human should just be typedef'd into submission.


One problem with the web is that it will remember wrong information just as well as it will remember correct information.


While cdecl was probably written before some of you where born it still does precisely one thing and does it well.

Cdecl (and c++decl) is a program for encoding and decoding C (or C++) type declarations.

http://linuxcommand.org/man_pages/cdecl1.html


Notably, one of the exercises in K&R (with a solution provided) is to write a mostly complete version of cdecl, which I think is great for dispelling much of the "magic" and increasing the understanding of how declarations are actually parsed.


That exercise was and probably still is above my pay grade.


And online: http://cdecl.org/


Once I tried to figure out how to parse complex C declarations just by reading the specification (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf), that is, without consulting guides for layman like this. But I gave up. I looked at what seemed a BNF-like description of the C grammar but I had no idea what it tells about the parsing rules. So I ended up using this guide: http://ieng9.ucsd.edu/~cs30x/rt_lt.rule.html With this I managed to implement an imitation of cdecl.


What is the value of bending ourselves to fit a confusingly designed old language, rather than bending the language to fit us? The very fact that articles like this have to be written indicates a failure of user interface design, which we needn't forever perpetuate.


Things break down utterly in the presence of typedefs. What is this?

    foo(*baz(bing,boff(*bratz)(biff)))(buff);


The “spiral rule” is just an approximation of the actual rule as defined in the standard: declaration follows usage.

Even with typedefs, that declaration means “when you call baz with a bing and a pointer (named bratz) to a function of type boff(biff), then you get back a pointer to a function of type foo(buff).”

It’s an extremely concise notation for expressing type information without (much) special type syntax, and I think it’s quite elegant in that way.


I'm impressed. How did you know where to start?


In C, the statement “type declarator;” is an assertion that “declarator” has the type “type”. In other words, if you read “declarator” as an expression (more or less), then it should have the type “type”. So here:

    foo (*baz(bing, boff (*bratz)(biff)))(buff);
“foo” is the type, and the rest is the declarator. Then you just break it down according to the usual precedence rules:

    baz(…)
“baz” is a function…

    baz(bing, …)
…which takes a “bing”, and…

    *bratz
…a pointer (arbitrarily named “bratz”)…

    (*bratz)(biff)
…to a function which takes a “biff”…

    boff(*bratz)(biff)
…and returns a “boff”…

    *baz(…)
…and “baz” returns a pointer…

    (*baz(…))(buff)
…to a function taking a “buff”…

    foo (*baz(…))(buff)
…and returning a “foo”.

With typedefs for function pointer types:

    typedef boff (*bratz_t)(biff);
    typedef foo (*baz_ret_t)(buff);

    baz_ret_t baz(bing, bratz_t);
Or for function types:

    typedef boff bratz_t(biff);
    typedef foo baz_ret_t(buff);

    baz_ret_t *baz(bing, bratz_t *);


The first red flag is that the rule says "clockwise" where there s clearly to way to distinguish clocwise from anticlockwise inside the code. Only the completely arbitrary choice of up/down direction of the drawing affects clockwiseness.

It's been 20years(!) Why is this incorrect advise still up at c-faq?


symbols with equal amount of open and close parentheses in order are counted by Catalan numbers

    (())()(()())(())
these count different arrangement of parentheses for function application. this guy is describing something like contour integration for computer programs


Or, as the book Expert C Programming says, declarations in C are read boustrophedonically.


I always heard of the right-left rule, which seems simpler and more accurate to me.

http://ieng9.ucsd.edu/~cs30x/rt_lt.rule.html


I'd take the striking simplicity of a lisp anyday compared to this mess.

Yeah, I know, I'm not good enough, I didn't study enough, I'm not enlightened enough. But why make things so overly comples in the first place?


This is the reason why golang declares the identifier before the type and the return value at the after the function parameters. This allows parsing any declaration from left to right.


Just don't make complex declarations in C, it's almost never useful and won't help anyone out. It'll confuse people and make your code write-only. Just put in a couple extra lines of code somewhere if you have to. It won't be the end of the world.


It'll certainly confuse people, but only those who aren't qualified to be doing anything with the code anyway.

"complex" is subjective. It reminds me of stupid "rules" like "don't use the ternary operator", "every function must be less than 20 lines" (I am not exaggerating --- this was on a Java project, however); and you could easily extend that to "every statement must have a maximum of one operator", "you must not use parentheses", "you must not use more than one level of indirection", etc. Where do you stop? To borrow a saying from UI, "if you write code that even an idiot can understand, only idiots will want to work on it." I don't think we should be forcing programmers to dumb-down code at all.

That said, I'm not advocating for overly complex solutions, and will definitely prefer a simpler solution, but you should know and use the language fully to your benefit.


>> It'll certainly confuse people, but only those who aren't qualified to be doing anything with the code anyway.

If the complexity can be avoided, why not avoid it. Removing complexity is not the same as dumb-downing code. It will improve readability and maintainability.

This mindset is defintitely applicable to declaration as well as code construct.

edit: clarity.


Note the last sentence of my comment. I am not advocating unwarranted complexity at all, but just saying that there are cases where an increase in local complexity can reduce overall complexity of the system, and you should not be afraid of using the language to the best of your ability.


It'll certainly confuse people, but only those who aren't qualified to be doing anything with the code anyway.

And people wonder why there are so many broken C programs out there...


"if you write code that even an idiot can understand, only idiots will want to work on it."

To me, that makes about as much sense as when Ricky Bobby in Talladega Nights says "If you ain't first, you're last."




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: