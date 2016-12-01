Hacker News new | comments | show | ask | jobs | submit login
A convenient untruth: Array notation in C is a lie (feabhas.com)
A convenient untruth: Array notation in C is a lie (feabhas.com)

38 comments





This gets weird when you compare it to how structs work in C. Both are complex data types, so I sometimes forget that array semantics are totally different to struct/union semantics. Unlike arrays, in ANSI C, structs are real value types. You can pass them by-value to functions, return them by-value from functions and assign them by-value to variables of the same type. Also, structs never work like pointers to themselves. You have to dereference the pointer or use the -> syntactic sugar to access a member of a pointer to a struct.

Value type semantics enable some neat things, for example, you can zero all the members of a struct by assigning a compound literal to it. You can also make simple structs, like three uint8_ts for an RGB colour, and use them just like they were primitive types. In comparison, it seems almost archaic that you have to break out memset() and memcpy() to zero and move arrays.

The equivalent in C# always made more sense to me (not comparing memory or allocation model between the languages, but simply syntax in relation to a person reading it):

    int[] arr = new int[5]; // C#

    int arr[5]; // C
The fact that the brackets go on the datatype always made more sense to me, after all, I want to refer to memory of a certain cell size (as indicated by int). I realize that there is a lot of stuff going on when using new, but i believe even in C it should read, because you effectively change the datatype (to be of type pointer, rather than int).

    int[5] arr;
But then, I will now duck and get the hell out. This is the same reason why it feels wrong to write:

    int *arr;
You're not changing the identifier, you're trying to change the datatype.

In summary, I believe that the syntax to fiddling with pointers in C is very misleading, and this I fully agree with the article, but as many people are accustomed to this notation, I will now duck and get far away from the internet, in fear of all the hateful comments explaining to me how I am wrong, and apparently just don't understand the superior beauty of complicated C syntax. I will now go and check my garbage collected privilege.

> I will now duck and get far away from the internet, in fear of all the hateful comments explaining to me how I am wrong, and apparently just don't understand the superior beauty of complicated C syntax.

I am going to be that person :-)

One phrase – "declarations mirror use". In a declaration, you use the same set of operators around the declared object that you would use in a normal expression. All of these operators (asterisk, `[]` and `()`) have the exact same precedence and associativity as in the rest of the language. The type specifier(s) on the left is the final type of the expression that you get after applying all of the operators in the correct order as per the precedence/associativity rules.

So when you see:

    char *arr[X]
You identify the identifier first: `arr`. Then, because `[]` takes precedence over the asterisk, you say that `arr` is an array (of size X) of pointers to `char`. In other words, the expression `*arr[some_index]` is of type `char`.

I don't know if "declarations mirror use" gets you all the way there. You can use:

  *some_index[arr]
(though obviously not saying you should), but you can't declare:

  char *5[arr];
Obviously there's good reasons for that, but then you have "declarations mirror use, except when there's good reason not to", which basically brings you back to the original question.

To be fair, while declaration mirrors use, the use contains a few traps for the beginner.

If you have written some assembler, you will see where C is coming from. It just occurred to me that most early C programmers were probably proficient In assembly.

I don't agree with:

    int[5] arr;
being a more readable syntax. The problem is that if you later have:

    int[7] arr2;
It would make sense that arr and arr2 are different types (as the thing on the left is different) while they are the same type and you should be able (hopefully!) to use them as arguments to the same function.

On the other hand I agree that:

    int* arr;
makes more sense than more commonly used:

    int *arr;
although it's messed up anyway as:

   int* a, b, c, d;
will result in a surprise.

"The Design and Evolution of C++" by Bjarne Stroustrup has a section on alternate declaration syntax that he considered. Compatibility won out but you might find the suggestions interesting.

Java also gets this right, probably in direct response to the crufty C syntax.

> (to be of type pointer, rather than int)

No,the array typeis not a pointer, although implicit casts to pointer occur e.g. for passing arrays as function arguments. The practical difference is ... well, I don't know.

Brings back fond memories of the first time I learned C, when I really had to dig into what the difference was between storage durations (auto/stack, dynamic/heap, static, thread local). It makes it increasingly important to think about when an object is going to be stored, and for how long. To me this is still a really useful concept that most high-level languages seem to have all disregarded in favor of extremely eager GC or refcounting, with the exception of Rust.

C has a really simple model with regard to storage durations, but the usage/omission of the respective keywords (static, extern, auto) is what makes it hard for beginners, IMO.

`static` means different things at file scope and at block scope, `extern` is redundant most of the time (except when linking to an object from another unit), `auto` is 100% redundant and is a leftover from the days when `int` was implied for every declaration.

    most high-level languages seem to have all disregarded in
    favor of extremely eager GC or refcounting, with the 
    exception of Rust.
Because Rust does a good job of hiding the fact it isn't a high level language.

Sure you have ML type checking, pointer safety rules, multiple returns. But if you can see past the syntax sugar you realize it is just C (with guardrails).

the twist with gc or refcounting is that you actually still need to think about memory management.. in many ways, unfortunately, it becomes harder to control. An advantage of user managed memory is that you always need to think about it, with every line you write so if there is an issue, it will become apparent rather quickly. Buffer overruns, however, are never much fun.. and largely avoided with auto memory management so it's clearly useful in most cases. At least with most of boring software that I write.

>Brings back fond memories of the first time I learned C, when I really had to dig into what the difference was between storage durations (auto/stack, dynamic/heap, static, thread local)

If those are your fond memories, I'd hate to hear your traumatic ones. :P

I suppose this article is tongue-in-cheek but it doesn't really demonstrate lies in the C language. It does point out some of the quirks of arrays in C but not calling them real arrays is a matter of interpretation of terms. C defined what the term array meant for a lot of the languages that followed it. That today's languages have diverged from C's definition of array is not surprising.

From the OP:

> Of course, if you’ve read this far you’ll (hopefully) realise that this post should have been taken in jest. Arrays aren’t really a lie (any more than any of C’s constructs are). Despite all the ‘trickery’ C’s arrays work well for many, many programming tasks. They are – as the title of this article suggests – a very convenient set of untruths.

It's not that convenient an untruth seeing as these are probably some of the first things you learn in C, and some of the first gotchas that'll getcha.

Having said that, the article was well written.

Some of the first things that people learn in C is the fallacy that "arrays and pointers are equivalent". An array is a series of contiguously laid-out objects that has a size known at compile time (except C99 VLAs). A pointer, on the other hand is merely a "single cell" that is supposed to contain an address and can be added/subtracted to, also dereferenced.

The truth is that array names decay to pointers except when the array is an operand to the `sizeof` or `&` operator.

I wish that was the case, as in the "first things you learn in C" unfortunately in most cases I come across it's not and many C programmers maybe know that very basics (int* ptr = arr;) but not much beyond that.

These facts is not a lies. They are inconviniences which one can see looking on C from perspective of higher level language. But if you learn how to program with assembler before studing C, than all these facts would look like obvious and convinient syntactic sugar.

Maybe in C++ these facts become inconvinient, because of C++ pretending to be higher level language than crossplatform assembler. But if it is a problem, it is not problem of C, it is problem of C++.

Is this article a sign that C has fallen out of mainstream use?

Four answers:

No, C (and C++) is used as much as ever. HN echo chamber aside, Rust and/or Go haven't made much of a dent.

No, we had such articles for decades.

No, it's just an article that points some issues with C, like exist for every language and environment (e.g. tons of articles on JS shortcomings). No correlation whatsoever with such an article and the language falling out of mainstream use.

No, this is a bizarro question. It's an article by single person, not some general trend.

No, C (and C++) is used as much as ever

Is this really true, especially for C? Lots of things that used to be done in C is today done in C++ and lots of things that used to be done in C++ is today done in Java or C#.

reply


In the embedded world, the default language is still C by a wide margin. You have to argue hard to have C++ considered and languages like Rust&Go just aren't on the radar.

Only in HN world is C considered legacy. For the rest of the world, its the well-known workhorse of the software world.

reply


For the rest of the world, its the well-known workhorse of the software world.

But it's a workhorse that's continually being replaced. I'm not talking about Rust&Go. I'm talking about C++/Java/C#. Thinking about C projects I saw 15-20 years ago, hardly any of them would be written in C if they where started today. And even in the embedded world C++ is becoming more and more of a thing.

Is C used, of course. Is C going away, of course not. Is C "used as much as ever", I just don't see it.

I knew this already 30 years ago. While I am happy to not ever use C again it is still useful as a low level language.

If by "mainstream use" you refer to applications then maybe yes. If you refer to embedded use, for sure not.

Even in the embedded world I'm seeing more and more C++ these days.

True C++, especially with C++11/14 is growing, but based on all the recent studies (by people like embedded.com) C is still a long way ahead of C++ in the embedded space.

What is "mainstream" these days?

Whatever language you use.

Whatever language I use.

React (a javascript library, sometimes referred to as reactJS or react.js) and more specifically its most popular module, Reagent, which is a full, lazily-loaded preemptive operating system that can run concurrent Java, Pythonjs, Rubyjs programs all from your browser while allowing cooperative suspend, load and save to network or local storage, intertab cooperative process management, etc.

Basically, if you're not working in an add-on to a framework library written in javascript running in a web browser, you might as well be using punch cards.

I made the part about Reagent up, but you know you believed it.

We're so far from the metal we might as well be sending a telegram with our requirements.

In the five seconds it takes this crap to load and show you a still loading page, your CPU cores have done 40,000,000,000 sixty-four bit operations.

I don't know how this is a lie or untruth, even in jest. In a language that exposes memory management directly of course you can manually traverse an array.

I'd be careful about assuming that C code run on a human interpreter behaves similarly to C code compiled by a modern compiler. For example, string literals probably don't actually have to exist in memory anywhere, but could arise implicitly from the control flow of your program. A compiler could probably turn:

    char *x = "abcdefghijklmnopqrstuvwxyz"
    printf("%s", x)
into something like:

    for (int x = 'a'; x <= 'z'; x++) {
      putc(x);
    }
Maybe when your program thinks it's accessing the 42nd element of that array, it's actually accessing some function of (the number of clock cycles in the CPU's counter, n unrelated code segments XOR'ing into a memory location, the executable's exact binary output) and the compiler has conspired to make these calculate to what that string's value would've been in an imaginary virtual machine to save 2 bytes(or because they're cached).

Sounds like a good DRM scheme actually.

And who says pointers are to RAM addresses? Maybe the compiler statically notices that the pointer's target stays strictly between 'a' and 'z', and decides to use a simple 26-value counter. Depending on how you debug a program compiled by a sufficiently smart compiler, pointers could point to RAM addresses only when you're looking at them.

That for loop you thought you wrote? Well, your program accesses different parts of the result at different times, so it scattered it all over your program so it's lazily computed. The loop counter or pointer never actually exists or takes on any value.

You can't be sure any of it exists unless you add logging or inspection. The whole program could be a lie, cleverly calculated to mimic the one you really intended.

> The whole program could be a lie, cleverly calculated to mimic the one you really intended.

The standard has a similarly convoluted way of saying that, in a nutshell, C compilers are permitted all optimizations under the as-if rule. (§ 5.1.2.3)

The "canonical" mental memory model of C (globals in data/BSS, locals on the stack, memory is a big array, code and data are the only artifacts) may be utterly useless when reasoning about the performance of a program, but helps the programmer greatly when approaching a problem.

The language was designed with these things in mind, no matter how much more sophisticated compilers and hardware have become, and no matter how much language lawyer fetishists frown on you for saying "this is on the stack" instead of "this has automatic storage duration".

You're describing a general problem in general terms...

Does a program that has no side effects even exist?? ooOOOoh, spooky....

BTW, yes I know what you're getting at... optimisers are allowed to perform any transformation as long as they're semantically equivalent. This applies to all languages.

> This applies to all languages.

Sure, but it's common for C programmers to think C is a low-level language with concepts that map straightforwardly to the target machine. You can't simultaneously think that C is a low-level language "close to the machine" and that your program can be freely rewritten into an eldritch horror. That's the only reason it would be a good "lie".

Contrast that with something like Perl, where people accept that an array is whatever Larry Wall wants it to be.

