I haven't used pointer arithmetic for a long time but the "value" of the index depends on the type, doesn't it? For a char[], it's (a + index), but for an int[] it should be (a + 4*index), or (a + sizeof(type)*index).
I have never understood how the compiler is supposed to know that it's (a + 4*index) and not (index + 4*a) which gives different results. Maybe it's based on the type itself and the computation is done by multiplying the "integral" index with it's size, and adding the "pointer" if there is one.
But even this can give wrong values if you're on an embedded platform where you can store your pointers as integers, size_t, or simple defines without a type.
If anyone knows where I can find the explanation in the standard, I've been wondering about it for at least 20 years...
Edit: it seems that a simple example gives the solution: (void*)[void*] is invalid, and int[int] is invalid too, therefore the compiler will get the pointer as the address, and the integral type as the index. I feel stupid for not testing this earlier.
When you do `*(pointer + 1)`, it'll actually increment the address of the pointer by `sizeof(*pointer)`, so it indeed works the same as indexing `pointer[1]`
The key point of the article here is that this is no longer strict syntactic sugar in C++ as of C++17. The addition operator, like many operators and all function calls, does not specify the order that its arguments are evaluated. In C, and in C++ prior to C++14, neither did the indexing operator; but as of C++17, the order is fully defined (`a` first in `a[i]`).
Apart from sequencing rules, a[i] may also differ in value category. If one of the operands is an array rvalue then the result is also rvalue. *(a + i) can't propagate value category this way, dereferencing always produces an lvalue.
What this is talking about is the behavior of [] in C and C++. In these
x[y]
and
y[x]
are the same. I was first introduced to this by a friend of mine many many years ago (because I'm an old) as
for (i = 0; etc)
putc(i["hello world"])
or similar nonsense.
The C++ change that makes this matter is that apparently pre c++17 doesn't enforce sequencing such that expression1[expression2] doesn't require expression1 be evaluated before expression2. C++17 does actually fix the sequencing to be left to right, so now expression1[expression2] will always evaluate as expression1;expression2 and expression2[expression1] will always evaluated as expression2;expression1 without depending on UB.
there is the same evaluation order issue, and they should have the same behaviour: is a evaluated before b or the compiler is allowed to choose an order? Clearly it depends on the rules for
a+b
which are more important than the special case of array indexing.
It is extremely rare even in ancient C code. The only reason why it exists in C at all is because rewriting a[i] as *(a+i) was inherited by C from B; the latter only has a single machine word type for everything, so there was no way to define it such that a[i] would be legal but i[a] would be illegal in B.
Oh C++… The level of excitement over pointless triviality you generate never ceases to amaze me.
Some nerds somewhere are getting all giddy about this silliness when it would just be objectively better to not have this silly quirk in the first place and to write it like a sane person who recognizes the great social benefits of maximizing understanding:
auto idx = index();
return p[idx];
Clever generally just means “bad”. Why people get so excited about it mystifies me…
This was previously an undefined order. It is now defined. Any code that you would have reviewed previously requires no additional knowledge to review now. Any code that is depending on this new C++17 defined behavior is either pushing the cleverness ceiling, or is going to document the dependency.
Anyone not using something like Sonar in their pipelines is really doing themselves a disservice, even with languages considered less bloated like Python, Java, C#, I cannot keep up with all the rules after circa 25 years of ecosystem evolution (less for C#).
Better question: why did C and C++ not define this as left to right (the answer is perf claims decades ago, that are not really valid now, and questionable even then).
The example
a[b]
doesn't really exhibit the unexpected, but as I understand it
f()->a[b()]
could evaluate as
temp = b()
f()->a[temp]
which it seems reasonable to consider unexpected. The lack of left to right sequencing here seems not-dissimilar to the lack of sequencing in call arguments, which was also realized to be a needless footgun and fixed.
Half of the new features of "modern" C++ editions leave me agreeing with the change but wondering what they were thinking before. Another example: std::map::contains was introduced in C++20; why did they go decades thinking it wasn't necessary to have such an operator (instead providing only count() even though a map has unique keys), and why did they change their minds only now? Did the old guard literally die off or something?
The historical lack of sequencing is because they were unsequenced in C, and C++ thus inherited it.
And yes, it does seem that there were a lot more people hell bent on resisting correcting unnecessary UB in the past than today, but they’re still around.
A pity the c++ compiler has no way to recognise calls to the c++ library in order to do rewrites like that automatically. It's CSE at the stdlib level, should definitely be possible to do that.
Hard coding behavior like for the STL that seems pretty questionable, especially given that std::map and std::unordered_map have poor performance compared to other alternatives (e.g. absl::btree_map and absl::flat_hash_map, and likewise folly has better implementations).
One subtle but noteworthy thing is that the idiom only mentions map by name once, which is nice if you have a very long map name, and prevents copy paste bugs where you only update one of the two mentions.
This pattern exists in C++ as well. The specific issue here is that all these STL APIs are in terms of C++'s iterator model, and can't be replaced with a more modern "optional" style that allows this cleaner coding style :-/
I think the problem is that with `std::map::find(item)` you get the value back but you don't with `std::map::contain(item)` - which means a second lookup to actually retrieve the value, no?
Because otherwise the evaluation order of the following construct is undefined, and you can get different results if you compile your code with different compilers:
foo()[bar()];
In nearly all cases it won't matter, but it can matter if foo() and bar() have some kind of shared state (or affect each other's return values, but seriously don't do that). In general it's better to have a defined evaluation order unless there's some compelling reason not to.
But both C and C++ are comfortable with *(foo() + bar()) having undefined evaluation order, so foo()[bar()] potentially having order-dependent results is clearly not sufficient reason for this decision...
In C++14, the foo() calls happened in unspecified order.
C++17 guarantees that "a(b)" and "a[b]" evaluate "a" first, which effectively means the foo() calls will now happen in order (1 to 4).