
A convenient untruth: Array notation in C is a lie - ingve
https://blog.feabhas.com/2016/12/a-convenient-untruth/
======
cessor
The equivalent in C# always made more sense to me (not comparing memory or
allocation model between the languages, but simply syntax in relation to a
person reading it):

    
    
        int[] arr = new int[5]; // C#
    
        int arr[5]; // C
    
    

The fact that the brackets go on the datatype always made more sense to me,
after all, I want to refer to memory of a certain cell size (as indicated by
int). I realize that there is a lot of stuff going on when using new, but i
believe even in C it should read, because you effectively change the datatype
(to be of type pointer, rather than int).

    
    
        int[5] arr;
    

But then, I will now duck and get the hell out. This is the same reason why it
feels wrong to write:

    
    
        int *arr;
    

You're not changing the identifier, you're trying to change the datatype.

In summary, I believe that the syntax to fiddling with pointers in C is very
misleading, and this I fully agree with the article, but as many people are
accustomed to this notation, I will now duck and get far away from the
internet, in fear of all the hateful comments explaining to me how I am wrong,
and apparently just don't understand the superior beauty of complicated C
syntax. I will now go and check my garbage collected privilege.

~~~
bluetomcat
> I will now duck and get far away from the internet, in fear of all the
> hateful comments explaining to me how I am wrong, and apparently just don't
> understand the superior beauty of complicated C syntax.

I am going to be that person :-)

One phrase – "declarations mirror use". In a declaration, you use the same set
of operators around the declared object that you would use in a normal
expression. All of these operators (asterisk, `[]` and `()`) have the exact
same precedence and associativity as in the rest of the language. The type
specifier(s) on the left is the final type of the expression that you get
after applying all of the operators in the correct order as per the
precedence/associativity rules.

So when you see:

    
    
        char *arr[X]
    

You identify the identifier first: `arr`. Then, because `[]` takes precedence
over the asterisk, you say that `arr` is an array (of size X) of pointers to
`char`. In other words, the expression `*arr[some_index]` is of type `char`.

~~~
mnarayan01
I don't know if "declarations mirror use" gets you all the way there. You can
use:

    
    
      *some_index[arr]
    

(though obviously not saying you _should_ ), but you can't declare:

    
    
      char *5[arr];
    

Obviously there's good reasons for that, but then you have "declarations
mirror use, except when there's good reason not to", which basically brings
you back to the original question.

~~~
chc
> char *5[arr]

Isn't that exploiting a quirk of the operator's definition, rather than how
the operator is intended to be used?

~~~
mnarayan01
I assume you're right, though I would _love_ to see some of the original
discussion around it. That said, while I don't think it's a catastrophic point
against "declarations mirror use", I do think it's a strike against it.

Edit: I guess I'll also note here just for fun that while

    
    
      char [5]arr;
    

seems reasonable enough,

    
    
      char [4][5]arr1;
      char[5][4] arr2;
    

seems less so.

------
rossy
This gets weird when you compare it to how structs work in C. Both are complex
data types, so I sometimes forget that array semantics are totally different
to struct/union semantics. Unlike arrays, in ANSI C, structs are real value
types. You can pass them by-value to functions, return them by-value from
functions and assign them by-value to variables of the same type. Also,
structs never work like pointers to themselves. You have to dereference the
pointer or use the -> syntactic sugar to access a member of a pointer to a
struct.

Value type semantics enable some neat things, for example, you can zero all
the members of a struct by assigning a compound literal to it. You can also
make simple structs, like three uint8_ts for an RGB colour, and use them just
like they were primitive types. In comparison, it seems almost archaic that
you have to break out memset() and memcpy() to zero and move arrays.

~~~
mojuba
Worth noting that you can always wrap an array in a structure and kick it
around in your program happily, as if it's how the language should've been.

    
    
        struct { int a[5]; } arr5 = {{4, 3, 2}};
    

What's a bit weird though, is that while initialization like above is
possible, assignment of a contsant is not as smooth in C, you will have to
typedef your structure and use it in a typecast expression:

    
    
        typedef struct { int a[5]; } Array5;
        ...
        arr5 = (Array5){{4, 3, 2}};

~~~
gpderetta
You can actually drop the extra brackets (at least in C++, but IIRC also in
C).

C++11 has array<T, size_t> which has proper value semantics. It is implemented
exactly like your structure above, so no overhead, and also has a proper
operator== and assignment via initializer list.

------
jcoffland
I suppose this article is tongue-in-cheek but it doesn't really demonstrate
_lies_ in the C language. It does point out some of the quirks of arrays in C
but not calling them _real_ arrays is a matter of interpretation of terms. C
defined what the term array meant for a lot of the languages that followed it.
That today's languages have diverged from C's definition of array is not
surprising.

~~~
burntsushi
From the OP:

> Of course, if you’ve read this far you’ll (hopefully) realise that this post
> should have been taken in jest. Arrays aren’t really a lie (any more than
> any of C’s constructs are). Despite all the ‘trickery’ C’s arrays work well
> for many, many programming tasks. They are – as the title of this article
> suggests – a very convenient set of untruths.

------
ordu
These facts is not a lies. They are inconviniences which one can see looking
on C from perspective of higher level language. But if you learn how to
program with assembler before studing C, than all these facts would look like
obvious and convinient syntactic sugar.

Maybe in C++ these facts become inconvinient, because of C++ pretending to be
higher level language than crossplatform assembler. But if it is a problem, it
is not problem of C, it is problem of C++.

~~~
gpderetta
None of those 'lies' have anything to do with assembler. C deviated from
regularity for the sake of convenience in some cases, and now we are stuck
with those bad decisions forever.

~~~
ordu
> None of those 'lies' have anything to do with assembler.

They do. Lets take a look at the very first one. "Array name is just a
pointer".

Assebler "array" is a name of a label. So its just named address in memory. Or
we can alternatively say, that assembler array is constant pointer.

`sizeof' returns size of array in bytes? Hmm... maybe that because main C
abstraction for memory is the assembler one: memory is a continuous sequence
of bytes? `sizeof' meant to be used for functions like malloc or memcpy, not
for operator `new'. When we use some dynamic memory allocation in assembler we
get pointer to untyped memory chunk, compare with C:

void* malloc(size_t size);

If you wish I can show you the connections of other 'C lies' with assembler
abstractions. I'm too lazy to write about all of them, but I could write about
one more if you ask. Just pick one you like more.

> now we are stuck with those bad decisions forever

Yes, you are right. We stuck with that. And it is bad. But it doesn't make my
point wrong. C is crossplatform assembler, and these decisions looks pretty
good from perspective of assembler. They give to programmer low level control
on generated machine code while keeping code portable, and its very useful in
some cases. For example when you developing an OS kernel.

~~~
gpderetta
From an assembler point of view a structure of N elements of type T and an
array of T[N] have exactly the same layout and are accessed in exactly the
same way [1], but in C have wildly different semantics.

Sizeof behaves exactly the same way for structs and arrays, so it is one of
the few things in C that treat arrays "correctly".

[1] although usually the offset is constant for a struct field access.

~~~
ordu
> but in C have wildly different semantics.

Can you show it with example? I'm not sure what you mean exactly.

Look:

    
    
        struct {
            int a, b, c;
        } foo;
        int *pfoo = (int*)&foo;
        printf("%d, %d, %d\n", pfoo[0], pfoo[1], pfoo[2]);
    

Code like this can have problems due to alignment, but at my opinion its not
"wildly different semantics".

~~~
gpderetta
In addition to being UB, the example doesn't illustrate the issue: arrays in C
are not first class as they can't be passed by value and can't be assigned.
The decay-to-pointer thing that prevent this regularity has nothing to do with
asm.

~~~
ordu
> In addition to being UB,

Yes, its UB. But I'm not persuade you to use this UB in real code: in real C
code use offsetof from stddef.h. The only thing I want to say is: this code
would work everywhere (if you pay attention to alignment). And its not
coincidence by some chance: C mimics asm, because C needs to be 100%
predictable to coder. Because asm use simpliest and the most obvious
abstractions, with predictable runtime costs. C also goes this way. So its
inevitable for my code to work. With some precautions, but it would work
everywhere.

> the example doesn't illustrate the issue...

Yes, I suggested it, and I asked you for some illustrative example, because I
can't understand your reasoning from "arrays are not first class" to "nothing
to do with asm". I see it other way: "arrays are not first class" _is_ "asm
mode".

~~~
gpderetta
> this code would work everywhere it does not, it will be miscompiled by
> modern compilers.

> I asked you for some illustrative example,
    
    
      foo(T x) { x[0] = 1; }
      T x = {0};
      foo(x);
      assert(x[0] == 0);
    

The assertion fails for T = char[1], but succeed for T=std::array<char, 1>;
You could construct a similar example in pure C.

std::array and C arrays compile down to the exact same code for access, have
the exact same layout, etc, but C array are not copyable and assignable and
implicitly convert to pointers without any good reason. This has nothing to do
with assembler whatsoever.

~~~
ordu
>> this code would work everywhere

> it does not, it will be miscompiled by modern compilers.

Sorry, due to formatting bug I overlooked this.

Can you show me example of such a modern compiler? I suspect that you mean
some C++ compiler, and they probably do, they would `miscompile' my example,
because they treat struct in a matter similar to a class with vtable and all
other stuff. But we are speaking about C, not C++. But if I'm mistaken with my
suggestions, I'd like to know about modern compiler of C which prove me wrong.
Such a proof can help me to understand modern C much better.

~~~
gpderetta
It is hard for compilers to miscompile this specific example as it doesn't do
much at all.

The idea is that a write to pfoo[1] couldn't possibly alias with any write to
foo, so the compiler should be free to reorder accesses if profitable. This is
the same in C and C++ and has nothing go do with vtables.

For what is worth, I couldn't get gcc, clang and icc it to miscompile [¹] a
slightly changed example, so either it is not actually UB or compilers still
refrain to make this kind of optimization as it would break way too much code.

[¹] i.e. they elect to reload from the struct after writing to the array and
vice versa even when it would be profitable not to do so.

------
Animats
I once proposed a backwards-compatible way out of this: "Safe Arrays for
C"[1]. The fundamental problem with arrays in C is that compiler has no idea
how big they are. My proposal was to replace

    
    
        int read(int fd, char buf[n], size_t n);
    

with a safe form

    
    
        int read(int n; int fd, char (&buf)[n], size_t n);
    

This says that the size of "buf" is "n", which comes in as another parameter.
There are no array descriptors; the generated code for a call is the same.
Thus, this is backwards-compatible, allowing mixing of "regular C" and "safe
C" modules.

The programmer has to know how big the array is, after all. There must be some
way to compute the array size from other variables or constants, or the
program has no hope of working. All C needs is a way to allow the programmer
to say that in the language. Then subscript checking is possible. Buffer
overflows can be eliminated.

The required changes to C are minor. The big one is adding C++ references.
Instead of passing a pointer to the first element of an array, you pass a
reference to the array. Same object code, but now arrays are first-class
objects.

This was discussed at length on the C standards digest back in 2012. After
many revisions, the conclusion was that it was technically feasible, but too
difficult politically.

[1]
[http://www.animats.com/papers/languages/safearraysforc43.pdf](http://www.animats.com/papers/languages/safearraysforc43.pdf)

~~~
chrisseaton
> The fundamental problem with arrays in C is that compiler has no idea how
> big they are

Isn't that the compiler's choice, to not know how big they are? You could
write a standards compliant implementation of C that did track how big arrays
were if you wanted to couldn't you?

~~~
Animats
That's been done, with "fat pointers". GCC used to have an option for that,
but it wasn't used much. The overhead is all at run time and is
substantial.[1]

[1]
[http://www.imperial.ac.uk/pls/portallive/docs/1/18619746.PDF](http://www.imperial.ac.uk/pls/portallive/docs/1/18619746.PDF)

------
kazinator
> _Why is this [array assignment] failing? Because the array’s name is a lie!
> Using a variable as an expression normally yields its value, but in the case
> of arrays the array name yields a pointer (to the first element; which is at
> least reasonable)_

Sorry, no; the assignment fails because an array isn't a modifiable lvalue.
Array assignment simply isn't supported.

If array assignment were supported the array-to-pointer conversion ("decay")
could be suppressed in that case to make it work. Just like it is suppressed
when an array is the operand of sizeof of & (address-of).

Assignment of arrays is supported when they are struct/union members:

    
    
      struct wrapper {
        int a[5];
      } x = { 0 }, y = x;
    

We can return this from a function, too.

------
spacelizard
Brings back fond memories of the first time I learned C, when I really had to
dig into what the difference was between storage durations (auto/stack,
dynamic/heap, static, thread local). It makes it increasingly important to
think about when an object is going to be stored, and for how long. To me this
is still a really useful concept that most high-level languages seem to have
all disregarded in favor of extremely eager GC or refcounting, with the
exception of Rust.

~~~
valarauca1

        most high-level languages seem to have all disregarded in
        favor of extremely eager GC or refcounting, with the 
        exception of Rust.
    

Because Rust does a good job of hiding the fact it isn't a high level
language.

Sure you have ML type checking, pointer safety rules, multiple returns. But if
you can see past the syntax sugar you realize it is just C (with guardrails).

~~~
Cursuviam
High level abstraction or syntax sugar?

Terrorist or freedom fighter?

------
pdog
Why did people think otherwise? Chapter 5 of _K &R_[1] is titled _Pointers and
Arrays_ and basically explains that arrays and pointers are equivalent.

[1]: [https://www.amazon.com/Programming-Language-Brian-W-
Kernigha...](https://www.amazon.com/Programming-Language-Brian-W-
Kernighan/dp/0131103628/)

~~~
notacoward
The point of the article is that sometimes they're _not_ equivalent, and that
creates a lot of confusion. Please read it before commenting on it.

~~~
ams6110
I thought the same things as GP. Anyone who has read K&R (as any C programmer
ought to have) will know everything in this article.

~~~
notacoward
I don't think that's true. While I don't have my copy of K&R handy, I don't
recall it covering all the subtleties of how assignment and increment
operators and sizeof will work differently for something declared as an array
vs. something declared as a pointer. At least in the edition I've had, it just
had the same "pointers and arrays are equivalent" which is misleading in
exactly the way this article describes. Did that get added in your much-later
edition?

------
anjc
It's not that convenient an untruth seeing as these are probably some of the
first things you learn in C, and some of the first gotchas that'll getcha.

Having said that, the article was well written.

~~~
bluetomcat
Some of the first things that people learn in C is the fallacy that "arrays
and pointers are equivalent". An array is a series of contiguously laid-out
objects that has a size known at compile time (except C99 VLAs). A pointer, on
the other hand is merely a "single cell" that is supposed to contain an
address and can be added/subtracted to, also dereferenced.

The truth is that array names decay to pointers except when the array is an
operand to the `sizeof` or `&` operator.

~~~
jandrese
Right, the fact that arrays aren't pointers is academic when they decay to
pointers at the drop of a hat. In fact it's so easy to decay the array to a
pointer that it is generally best to always treat it as a pointer lest you get
burned later on during a code refactor. This mostly means never using sizeof()
to get the size of an array.

~~~
bluetomcat
Even if we simplify it this way, arrays are still different because they
cannot be assigned to and their decayed pointer can never be NULL (they always
have a backing storage provided by the compiler):

    
    
        int arr[5];
        int *ptr;
    
        // "arr" has a fixed backing storage of sizeof(int) * 5 bytes
        // "ptr" may point to anything or be NULL, depending on control flow

~~~
jandrese
Best to check for NULL anyway though, because anything can happen once you let
it decay to a pointer. Never assigning them is a good idea though, maybe it's
best to think of them as constant pointers? But that's more confusing
terminology for a C programmer, so maybe not. People get really wrapped around
the axle when differentiating a constant pointer from a pointer to a constant.

------
utopcell
C now has static array indices. For example

void func(int arr[static 8]) {}

imposes a limit on the size of the array that can be passed as an argument
(you cannot pass an array of 7 or fewer elements.)

I'd suggest that to the author, but given the article, I fear it may give him
a heart attack.

~~~
ThatGeoGuy
I know they're called static array indices, and they're called that because of
the use of the keyword `static`, but they don't have to be compile time
constant at all (and checks aren't performed at compile time, IIRC). You can
have the following:

    
    
        void foo(size_t len, int arr[static len]);
    

Which is really useful in asserting that you won't pass in the null pointer at
runtime (so you can remove any `if (arr) {}` checks). Taken further, this
exact method is what makes the restrict keyword usable in practice. One of the
main problems of the restrict keyword is that you shouldn't be aliasing
pointers. By ensuring that your passed-in array isn't actually a null pointer
using the above syntax, you avoid one of the biggest problems of aliasing:
passing in two null pointers. Consider:

    
    
        void foo1(size_t len1, restrict int* arr1, size_t len2, restrict int* arr2);
    

vs

    
    
        void foo2(size_t len1, restrict int arr1[static len1], size_t len2, restrict int arr2[static len2]);
    

The second is more verbose, but you have a stronger guarantee that this
procedure won't be called with pointers aliased to NULL. The compiler can (and
I believe in the case of GCC, will) take advantage of this.

~~~
clarry
> Taken further, this exact method is what makes the restrict keyword usable
> in practice. One of the main problems of the restrict keyword is that you
> shouldn't be aliasing pointers. [..] one of the biggest problems of
> aliasing: passing in two null pointers

You shouldn't be dereferencing NULL pointers. Aliasing them is perfectly fine.

The non-aliasing requirements of restrict kick in only when you are actually
accessing (and modifying!) the object referenced by an lvalue based on an
expression of the restrict qualified pointer. So NULL pointers don't matter
because first, they do not point to an object, and second, if you dereference
them, you're already in UB land anyway. Correct code will not dereference NULL
pointers, therefore the restrict qualification means absolutely nothing on
code that opts to not try access anything through NULL pointers.

EDIT:

N1256 6.7.3.1p4 under Formal definition of restrict ( _emphasis_ mine):

> During each execution of B, let L be any lvalue that has &L based on P. _If
> L is used to access the value of the object X that it designates, and X is
> also modified_ (by any means), then the following requirements apply: T
> shall not be const-qualified. Every other lvalue used to access the value of
> X shall also have its address based on P. Every access that modifies X shall
> be considered also to modify P, for the purposes of this subclause. If P is
> assigned the value of a pointer expression E that is based on another
> restricted pointer object P2, associated with block B2, then either the
> execution of B2 shall begin before the execution of B, or the execution of
> B2 shall end prior to the assignment. If these requirements are not met,
> then the behavior is undefined.

------
Vanit
I don't know how this is a lie or untruth, even in jest. In a language that
exposes memory management directly of course you can manually traverse an
array.

~~~
allemagne
It's just a different way of looking at C that might resonate with someone new
to the language and the idea of exposed memory. There's no "lying" but by
framing it like a story or an evil conspiracy it might make it interesting or
fun enough to stick when a clinical description might not for many students.

------
yason
C has no more real arrays than assembly: for the cpu, it's just differently
indexed pointers in the end. But arrays in C have some notable differences
from pointers in C however, please read a good detailed description here:
[http://eli.thegreenplace.net/2009/10/21/are-pointers-and-
arr...](http://eli.thegreenplace.net/2009/10/21/are-pointers-and-arrays-
equivalent-in-c)

------
hardlianotion
Is this article a sign that C has fallen out of mainstream use?

~~~
pjc50
What is "mainstream" these days?

~~~
logicallee
React (a javascript library, sometimes referred to as reactJS or react.js) and
more specifically its most popular module, Reagent, which is a full, lazily-
loaded preemptive operating system that can run concurrent Java, Pythonjs,
Rubyjs programs all from your browser while allowing cooperative suspend, load
and save to network or local storage, intertab cooperative process management,
etc.

Basically, if you're not working in an add-on to a framework library written
in javascript running in a web browser, you might as well be using punch
cards. /s

I made the part about Reagent up, but you know you believed it.

We're so far from the metal we might as well be sending a telegram with our
requirements.

In the five seconds it takes this crap to load and show you a still loading
page, your CPU cores have done 40,000,000,000 sixty-four bit operations.

~~~
greenshackle2
You laugh, but Odoo* 's hand-rolled, Backbone.js-based frontend framework
includes an interpreter for a subset of Python. They call it py.js. I'm not
fucking with you.

[https://github.com/odoo/odoo/tree/10.0/addons/web/static/lib...](https://github.com/odoo/odoo/tree/10.0/addons/web/static/lib/py.js)

Just so that you can experience the full horror: Yes, the Python server ships
XML templates with raw embedded Python code to the client. The client then
parses the XML and interprets the Python using this py.js thing.

If the embedded Python needs to access the database (and it almost always
does), the client makes calls to a JSONRPC interface on the server.

(If you ever think of using Odoo. Don't. Just... don't.)

*An open source ERP.

------
MichaelBurge
I'd be careful about assuming that C code run on a human interpreter behaves
similarly to C code compiled by a modern compiler. For example, string
literals probably don't actually have to exist in memory anywhere, but could
arise implicitly from the control flow of your program. A compiler could
probably turn:

    
    
        char *x = "abcdefghijklmnopqrstuvwxyz"
        printf("%s", x)
    

into something like:

    
    
        for (int x = 'a'; x <= 'z'; x++) {
          putc(x);
        }
    

Maybe when your program thinks it's accessing the 42nd element of that array,
it's actually accessing some function of (the number of clock cycles in the
CPU's counter, n unrelated code segments XOR'ing into a memory location, the
executable's exact binary output) and the compiler has conspired to make these
calculate to what that string's value would've been in an imaginary virtual
machine to save 2 bytes(or because they're cached).

Sounds like a good DRM scheme actually.

And who says pointers are to RAM addresses? Maybe the compiler statically
notices that the pointer's target stays strictly between 'a' and 'z', and
decides to use a simple 26-value counter. Depending on how you debug a program
compiled by a sufficiently smart compiler, pointers could point to RAM
addresses only when you're looking at them.

That for loop you thought you wrote? Well, your program accesses different
parts of the result at different times, so it scattered it all over your
program so it's lazily computed. The loop counter or pointer never actually
exists or takes on any value.

You can't be sure any of it exists unless you add logging or inspection. The
whole program could be a lie, cleverly calculated to mimic the one you really
intended.

~~~
dom0
> The whole program could be a lie, cleverly calculated to mimic the one you
> really intended.

The standard has a similarly convoluted way of saying that, in a nutshell, C
compilers are permitted all optimizations under the as-if rule. (§ 5.1.2.3)

~~~
bluetomcat
The "canonical" mental memory model of C (globals in data/BSS, locals on the
stack, memory is a big array, code and data are the only artifacts) may be
utterly useless when reasoning about the performance of a program, but helps
the programmer greatly when approaching a problem.

The language was designed with these things in mind, no matter how much more
sophisticated compilers and hardware have become, and no matter how much
language lawyer fetishists frown on you for saying "this is on the stack"
instead of "this has automatic storage duration".

------
MaulingMonkey
> Why is this failing? Because the array’s name is a lie! Using a variable as
> an expression normally yields its value, but in the case of arrays the array
> name yields a pointer (to the first element; which is at least reasonable)

Ahh, not quite. sizeof(array) was using the variable as an expression - not an
evaluated expression, but an expression nonetheless - and it's clearly not
giving us the same result as sizeof(array+0). In C++, you can even construct
references, which can be abused in conjunction with templates to create a
'safe' array size check, which relies on array maintaining it's array typing:

    
    
      int (&r)[5] = array;
    

Now, arrays _are implicitly convert to_ pointers is you so much as sneeze in
the same room as them, but there are instances (namely arrays of arrays and
the like) where you can fuck up your pointer math if you assume that simply
using the array name yields a pointer, or that the array 'is' a pointer -
because _that_ is the lie! If I'm feeling particularly explicit, I'll write
something like (assuming a and b are arrays, in C++ again):

    
    
      std::copy(a+0, a+N, b+0);
    

Where the +0s ensure I'm actually dealing with pointers. This avoids any
compiler errors from having mixed types for 'a' (array) and 'a+N' (pointer)
which, while rare (the former typically converts to a pointer at some point),
has happened to me at least once.

The real reason "this" (array init and assignment) is failing is that C
decided arrays weren't copyable and assignable like this. That's all. Really!
Now, one can think of plenty of rationale that made sense at the time (memcpy
is more explicit, simplifies the implementation to only implement
copy/assignment for simpler types, etc.) but it ultimately boils down to the
choice of the implementors.

------
coldpie
The thing to keep in mind is Never (Never) use array syntax in your function
arguments. It implies something that you can't rely on. More on this:
[https://lkml.org/lkml/2015/9/3/428](https://lkml.org/lkml/2015/9/3/428)

~~~
ThatGeoGuy
See
[https://news.ycombinator.com/item?id=13237674](https://news.ycombinator.com/item?id=13237674)
and my corresponding reply, where you _should_ use array syntax for function
arguments, but you need to do it using the `static` index syntax.

------
joeld42
I was lucky to learn C with pointers first, and then arrays. When you think
about it as just chunks of memory, it all makes sense and is easier to reason
about what the cpu will do. This is another example of a "simplifying
abstraction" that is more misleading than simplifying.

------
rhinoceraptor

      But this is exactly what the index operator is doing.  When you write (for example)
      arr[0]
      The compiler is re-writing your code as:
      *(arr + 0)
    

Since addition is commutative, you can also write arr[0] as 0[arr].

~~~
bonoboTP
Yes, it's in the article.

------
leonatan
These are no "lies", just misunderstanding on the part of those that believe
the untruths. Those that have basic understanding of C know most of the things
listed in the article.

------
tlan
The mycodeschool channel on youtube has a playlist[1] that provides a great
visual explanation of this information.

I found it really useful in getting my head around these concepts.

[1]:
[https://www.youtube.com/playlist?list=PL2_aWCzGMAwLZp6LMUKI3...](https://www.youtube.com/playlist?list=PL2_aWCzGMAwLZp6LMUKI3cc7pgGsasm2_)

------
jibsen
Regarding #3, it's worth noting that even though the elements of the two-
dimensional array form a contiguous block of integers, you cannot treat them
as such [1].

[1]:
[http://c-faq.com/aryptr/ary2dfunc2.html](http://c-faq.com/aryptr/ary2dfunc2.html)

------
wbkang
In the last example,

> char string[] = "Hello world";

I thought that gives out warning these days. Isn't the proper type of a string
literal the following?

> const char string[] = "Hello world";

Therefore, you can't really modify individual characters there.

~~~
coldpie
> Isn't the proper type of a string literal the following?

No, it's a const char pointer. Your declaration actually makes a copy of the
"Hello world" string into a new char array, distinct from the string literal
itself:

    
    
      [~]$ cat test.c
      #include <stdio.h>
      
      int main(int argc, char ** argv)
      {
          char a[] = "test2";
          printf("sizeof(a): %lu\n", sizeof(a));
          return 0;
      }
      [~]$ gcc -std=c99 -pedantic -Wall -o test test.c
      [~]$ ./test
      sizeof(a): 6
      [~]$

~~~
copascetic
The type of a string literal is array of char, and like other arrays, they
decay to pointers to the first element when used in expression context, with
three exceptions: sizeof, taking the address with &, and when used as a string
initializer. "test2" in your example is not a const char *, it's an
initializer, since this is one of the exceptions to array-to-pointer decay.

[http://c-faq.com/aryptr/aryptrequiv.html](http://c-faq.com/aryptr/aryptrequiv.html)
[http://c-faq.com/decl/strlitinit.html](http://c-faq.com/decl/strlitinit.html)

------
valbaca
This is covered in "Expert C Programming: Deep C Secrets"

[https://www.amazon.com/gp/product/0131774298](https://www.amazon.com/gp/product/0131774298)

------
pklausler
If you want Pascal, you know where to find it. C is C.

------
Radle
So the only difference between an Array and an *int is how sizeof() behaves?

And how do I pass my Array to a function without losing sight of its size?

~~~
optymizer
Short answer: For the user? yes. For the type system? no. To keep track of the
length of the array, you can pass the number of elements as a second parameter
to the function.

Long answer: For arrays with a declared length, the length is included in its
type (6.2.5.20, page 42, [1]). Therefore, the type of "int a[5]" is "array of
5 integers". The type of "int*" is "pointer to integer". For arrays without a
length, the type is considered 'incomplete' (6.2.5.22).

So the C typing system considers these 2 different types.

"Except when it is the operand of the sizeof operator or the unary & operator,
or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to
type’’ that points to the initial element of the array object and is not an
lvalue." (6.3.2.1.3)

sizeof is essentially an exception.

"The sizeof operator yields the size (in bytes) of its operand, which may be
an expression or the parenthesized name of a type. The size is determined from
the type of the operand. The result is an integer." (6.5.3.4.2)

And that's how sizeof is defined. Because it uses the type to compute the
size, and the type of arrays include their length, and the type of arrays
doesn't change in sizeof expressions, sizeof will return the total number of
bytes of all the elements of the array.

[1] C11 standard (draft): [http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1548.pdf](http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1548.pdf)

------
yCloser
This stuff is all taught in 1st year of C.S.

I hope that anyone who works with asm, C (and also C++) learned all this when
he was still a kid

~~~
ehntoo
From my own experience and what I've seen from major US universities, this
does not seem to be the case.

For intro (1st year-ish) CS, it looks like most places are teaching Python and
C++, with some institutions (such as my own) using Java. An ACM article from
2014 actually has some numbers here. [0]

I graduated relatively recently with a bachelors degree, majoring in CS and
Computer Engineering. I had only one course which actually used C, and that
wasn't for my CS major. I've spent a fair amount of time since then doing low-
level work on ARM micros, but definitely wasn't taught this in school.

[0] [http://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-
the...](http://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-the-most-
popular-introductory-teaching-language-at-top-u-s-universities/fulltext)

------
GrinningFool
Well done, Grasshopper. You are now ready to begin the true journey of
understanding.

------
zwieback
Don't even get me started on packing when you have arrays of structs...

------
notacoward
Worth noting: since everyone on HN is clearly an expert on C[1], we're just as
clearly not the audience for this post. It's obviously written for people who
haven't learned this yet, who might still be fooled by the superficial
similarity of arrays in C to arrays elsewhere (or to what any rational non-
lazy person not implementing their first compiler might expect). That doesn't
make it a bad article, so stop being so gratuitously negative. It's actually a
pretty good explanation, for somebody at that level, of how C arrays can trip
you up. I might use it myself, as a reference for some of the people I mentor.
Pedagogy matters.

[1] Or any other topic. Just ask any one of us. Apparently we all sprang fully
formed from Athena's brow, already endowed with every bit of knowledge we'll
ever need.

