How to find size of an array in C without sizeof (arjunsreedharan.org)
I'm surprised at all of the comments calling this stupid or pointless. The point is not that you should this trick in lieu of sizeof; the point is to shed light on a subtly of C arrays.

I suspect this article made a lot of people feel stupid, or in other words, it taught us something. Sometimes the ego gets out of check.

I think the article is well-presented and educational.

Quite. This exactly the sort of thing that makes C such a fun language.

I'm not sure if it's a praise for C though. Arcane design and lack of clarity might be fun to decipher, but it's not something that you'd want to see in the programming language.

Doesn't every language turns into insanity to decipher once you look close enough?

You've been on here 4 years and are surprised at the top comments criticizing the content of a post? :)

There's a reason this meme exists: http://i.imgur.com/Z6pFTjj.jpg

The result you get with this trick is signed, while the result you get with sizeof is unsigned.

Edit: Just to clarify, what you get is ptrdiff_t instead of size_t. So if array size is greater than PTRDIFF_MAX, you get undefined behavior [1].

[1] http://en.cppreference.com/w/c/types/ptrdiff_t

How likely do you run into array bigger than 2gb?

The "how likely is it, really?" response to questions of technical correctness has always bothered me. It takes a mindset completely alien to mine to say "Here's a race condition. Sure, it's undefined behavior, but the race is narrow, so it's rare" or to say "Sure, memory allocation can theoretically fail, but in practice almost never does" or to say "fsync is too slow and most computers have batteries these days".

Software is unreliable enough as it is due to problems beneath our notice. It seems reckless to avoid fixing problems that we do notice. Sure, you could argue that rare problems are rare and that users probably won't notice them --- this attitude is penny-wise and pound-foolish, because you can't meaningfully reason about a system that's only probably correct.

The problem you're latching on to I think is how the context for caculating a probability can vary.

If it were really as likely as, say, the sun exploding that X happened then it would be of no use to expend time on X.

BUT very often people speaking about the probability of events given suspicious constraints. While a memory allocation might not fail in most situations it will fail often in some situations. And a one-in-a-million chance is almost guaranteed when there are millions of uses.

Not likely, but possible. This reminds me of the bug that was found in the binary search algorithm a few years ago, IIRC, in Java. The interesting thing is that binary search is probably one of the earliest-invented algorithms. Yet, in the book Writing Efficient Programs by Jon Bentley (which I mentioned in a recent HN comment), he says that in a class he taught to several industrial programmers with many years of experience, some had bugs in their implementations of binary search that he set them as an exercise. Not sure but I think I remember reading in the article about the Java binary search issue, that even his algorithm had the bug that was found in the Java version. Why it was not found earlier is (maybe) because it only occurred with an extremely large array, IIRC. Don't have a link right now, but it can probably be found by searching for the right phrase.

Today, with ML, big data and similar applications, that might be often.

Probably not very likely, but keep in mind that this method could also be used without actually allocating the array -- akin to the 'offsetof()' macro. (Which is undefined behavior.)

That pun in the first sentence alone made the article worth it.

For the completeness sake, the size of an array can also be computed via linker symbols, see for example: http://stackoverflow.com/questions/29901788/finding-the-last....

Same constraints apply (pointer arith).

I am not sure why this method, applied to ordinary arrays, would be preferred to sizeof (), but since we're shedding light here...

EDIT: pointer arith constraints only apply if we compute the difference (end - beg) in the C code. We could also do that in the linker script itself, and I don't recall whether or not C semantics of ptrdiff_t would be preserved in that case. Such preservation doesn't seem very probable to me, so potentially this method might allow to avoid overflows (or to move them much higher) -- to be checked in the 'ld' doc!

C is such a boondoggle of a language... We're condemned to forever explore its every weird nook and cranny for historical reasons, rather than because it is the cleanest, best approach to things possible.

Was anyone else's first thought "Hmm... cool," followed by "I hope nobody asks me this on an interview?"

Interesting. I've been working with C for almost 30 years (first taught it to myself when I was 14) and never thought about the actual type of array.

Whether you use this method of getting the number of elements in an array or the more traditional sizeof method, please encapsulate the logic in a macro.

Instead of writing either of these:

  size_t length = sizeof array / sizeof array[0];

  size_t length = (&array)[1] - array;
Define this macro instead:

  #define elementsof( array )  ( sizeof(array) / sizeof((array)[0]) )
Or if you must:

  #define elementsof( array )  ( (&(array))[1] - (array) )
And then you can just say:

  size_t length = elementsof(array);
I've also seen the name 'countof' used for this macro. As long as the meaning is clear, the specific name is less important than using a macro in the first place.

A more detailed article here: http://www.g-truc.net/post-0708.html

with a cleaner way to do _countof using a template in C++ 11.

You can also use the template technique to pass a fixed size array to a function, and have the function determine the array size (without needing a 2nd length param, or null terminator element.). Similar to strcpy_s(): http://stackoverflow.com/questions/23307268/how-does-strcpy-...

MSVC has a built in _countof: http://stackoverflow.com/questions/4415530/equivalents-to-ms...

> please encapsulate the logic in a macro.

Why?

When reading such code, it means I would have to go and lookup a macro definition. So, there's a clear drawback. What's the benefit that makes it worthwhile?

Faster to read, and keeps the reader's mind at a semantically higher level.

I doubt the author meant for this trick to be actually used, they were just showing how pointers to arrays are typed correctly in a clever way.

Indeed, one could hope that is the case! :-)

But my point with suggesting the macro applies equally to the more traditional sizeof division. I have seen code that divides the two sizeofs every time an array length is needed. I think it's better to put that calculation in a macro so you only do it in one place.

reply


IIRC it is canonically called NELEMS(a).

Despite the argument at the end, this is undefined behavior in the latest C specification. The code dereferences a pointer one past the last element.

C11 6.5.6/8:

If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated

"it shall not be used as the operand of a unary * operator that is evaluated"

he doesn't use the * operator on it, he just calculates its position. If he were to access it (ie, use it with *) then that would be breaking the rule

I think the authors of the spec really meant something else: reading/writing a memory location past the end of the array is illegal. But here "*" is used only in an address computation, not to actually access memory.

Shows how difficult it is to get a spec right.

So, IMO, you are right, the code in the article is illegal (strictly speaking).

But I think it is likely that most compilers would still allow it, because that clause in the spec essentially exempts the compiler from adding an explicit bounds check.

reply


I don't think this is illegal. What is the clause in the spec that allows &arr[1]? I would try and see if it also applies to (&arr)[1].

reply


The snipper only calculates the pointer, and does not dereference it. Should be fine.

It's a complicated situation. There's a pointer to an array, and that pointer is dereferenced, resulting in an array (that then decays to a pointer). But that second array/pointer is not dereferenced. I'm not sure if it's legal.

How is this better than the sizeof method? This looks like a clever way to access sizeof information without explicitly using the sizeof operator.

reply


Quoting http://stackoverflow.com/a/16019052/1470607

  Note that this trick will only work in places where `sizeof` would have worked anyway.

Yes. This only works for arrays on the stack, at best. It assumes that arrays are placed on the stack in the order of declaration, which is not a requirement of the C standard and may differ between compilers.

Unless you're writing a buffer overflow exploit, in which case you need to know exactly what's on the stack and where, this isn't a good way to program.

I don't see how the code assumes anything about the placement of the array. Indeed, it works just fine for static arrays:

    $ cat test.c
    #include <stdio.h>
    
    int arr[5];
    
    int main(int argc, char *argv[]) {
    	printf("%lu, %ld\n", sizeof(arr) / sizeof(*arr), (&arr)[1] - arr);
    }
    
    $ gcc test.c && ./a.out
    5, 5
Not saying it's "a good way to program" - it's needlessly obfuscated compared to the standard sizeof alternative. But it doesn't rely on anything tricky.

> It assumes that arrays are placed on the stack in the order of declaration

I am not sure it is the case here. The code uses only one array, how can it assume the order of arrays?

reply


http://stackoverflow.com/questions/671790/how-does-sizeofarr...

I think this is effectively doing the same thing, but in a non-standard way; ie. I think `int n = (&arr)[1] - arr;` is substituted with the actual the number by the compiler the same way sizeof() would be, only noone will know wtf is going on.

Disclaimer: I didn't look at the generated code to confirm; I guess it could even be compiler/runtime dependent.

I don't think anyone is proposing that people use this. I read it as an exercise to stretch our understanding of other bits of the language.

this is undefined behavior. &arr + 1 can overflow. There's no guarantee &arr isn't near memory end boundary. &arr + 1 is converted at compile time to rbp - X where X is an integer determined by the compiler similarly to how sizeof works.

Basically ptr + integer requires the compiler to determine the sizeof ptr's type.

reply


No. From 6.5.6 Additive operators:

7 For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

8 [...] if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

So &arr + 2 can overflow, and &arr + 1 cannot be dereferenced, but &arr + 1 shall not overflow and is not undefined behaviour.

But arr != &arr even though they have the same value. #8 applies to arr (P), but in the post OP is using &arr which is a ptr to array[x] and doesn't apply to it.

That's why I quoted paragraph 7: arr is not an element of an array, so &arr (being a pointer to an object which is not an element of an array) behaves like a pointer to the first element of an array of length one with the type of the arr as its element type.

So &arr behaves like it's a pointer to the start of int[5][1].

reply


Can't we declare pointer of type &arr, assign it there and be sure that it points to equivalent of array[1] of &arr? If yes, then is it logically possible to have UB on that?

Many implementations historically also allocated enough memory to include one extra element at the end of the array.

The printf commands say "the address of..." but proceed to print out the value, not address.

