C++17 creates a practical use of the backward array index operator

omoikane · on April 3, 2023

Backward array index always had a use in the code golf context, like this:

    int a[3] = {0, 1, 2};
    int *p = a;
    int **q = &p;

    int x = (*q)[1];  // Read a[1]
    int y = 1[*q];    // Same, but saves 2 bytes

Code golf considerations aren't always related to practicality, of course.

zeusk · on April 3, 2023

If saving bytes is the goal, why not

> int a[]={0,1,2};

> int y = a[1];

pritambaral · on April 3, 2023

It's a demonstration, not real code. The first block of code is prologue, not main body.

lvkv · on April 3, 2023

I’ve always thought of the array index operator:

  a[index]

as syntactic sugar for the pointer arithmetic:

  *(a + index)

From this point of view, the existence of a “backwards” index operator makes sense; the arithmetic evaluates to the same address.

JTyQZSnP3cQGa8B · on April 3, 2023

I haven't used pointer arithmetic for a long time but the "value" of the index depends on the type, doesn't it? For a char[], it's (a + index), but for an int[] it should be (a + 4*index), or (a + sizeof(type)*index).

I have never understood how the compiler is supposed to know that it's (a + 4*index) and not (index + 4*a) which gives different results. Maybe it's based on the type itself and the computation is done by multiplying the "integral" index with it's size, and adding the "pointer" if there is one.

But even this can give wrong values if you're on an embedded platform where you can store your pointers as integers, size_t, or simple defines without a type.

If anyone knows where I can find the explanation in the standard, I've been wondering about it for at least 20 years...

Edit: it seems that a simple example gives the solution: (void*)[void*] is invalid, and int[int] is invalid too, therefore the compiler will get the pointer as the address, and the integral type as the index. I feel stupid for not testing this earlier.

FreeFull · on April 3, 2023

When you do `*(pointer + 1)`, it'll actually increment the address of the pointer by `sizeof(*pointer)`, so it indeed works the same as indexing `pointer[1]`

addaon · on April 4, 2023

The key point of the article here is that this is no longer strict syntactic sugar in C++ as of C++17. The addition operator, like many operators and all function calls, does not specify the order that its arguments are evaluated. In C, and in C++ prior to C++14, neither did the indexing operator; but as of C++17, the order is fully defined (`a` first in `a[i]`).

leni536 · on April 4, 2023

Apart from sequencing rules, a[i] may also differ in value category. If one of the operands is an array rvalue then the result is also rvalue. *(a + i) can't propagate value category this way, dereferencing always produces an lvalue.

mhh__ · on April 3, 2023

That's exactly what it is in C

devnull3 · on April 3, 2023

> Astound your friends! Confuse your enemies!

... more like your friends will curse you and your enemies will watch with glee that you are using C++

olliej · on April 3, 2023

What this is talking about is the behavior of [] in C and C++. In these

    x[y]

and

    y[x]

are the same. I was first introduced to this by a friend of mine many many years ago (because I'm an old) as

    for (i = 0; etc) 
       putc(i["hello world"])

or similar nonsense.

The C++ change that makes this matter is that apparently pre c++17 doesn't enforce sequencing such that expression1[expression2] doesn't require expression1 be evaluated before expression2. C++17 does actually fix the sequencing to be left to right, so now expression1[expression2] will always evaluate as expression1;expression2 and expression2[expression1] will always evaluated as expression2;expression1 without depending on UB.

HelloNurse · on April 4, 2023

In the competing array indexing syntax

  *(a+b)

there is the same evaluation order issue, and they should have the same behaviour: is a evaluated before b or the compiler is allowed to choose an order? Clearly it depends on the rules for

a+b

which are more important than the special case of array indexing.

jbverschoor · on April 3, 2023

I’ve never seen that before. Luckily

olliej · on April 3, 2023

He was also a tutor/TA and I think mostly used it to explain how pointers vs. indexing worked. Or possibly to traumatize students :D

int_19h · on April 3, 2023

It is extremely rare even in ancient C code. The only reason why it exists in C at all is because rewriting a[i] as *(a+i) was inherited by C from B; the latter only has a single machine word type for everything, so there was no way to define it such that a[i] would be legal but i[a] would be illegal in B.

TylerGlaiel · on April 3, 2023

please... just explicitly calculate index() first on its own line if the order matters like this...

orangepanda · on April 4, 2023

Would compilers optimise the unnecessary variable away?

xorvoid · on April 4, 2023

Oh C++… The level of excitement over pointless triviality you generate never ceases to amaze me.

Some nerds somewhere are getting all giddy about this silliness when it would just be objectively better to not have this silly quirk in the first place and to write it like a sane person who recognizes the great social benefits of maximizing understanding:

auto idx = index(); return p[idx];

Clever generally just means “bad”. Why people get so excited about it mystifies me…

bfrog · on April 4, 2023

Ah yes, another rule to try and remember while writing a reviewing c++

addaon · on April 4, 2023

This was previously an undefined order. It is now defined. Any code that you would have reviewed previously requires no additional knowledge to review now. Any code that is depending on this new C++17 defined behavior is either pushing the cleverness ceiling, or is going to document the dependency.

pjmlp · on April 4, 2023

Anyone not using something like Sonar in their pipelines is really doing themselves a disservice, even with languages considered less bloated like Python, Java, C#, I cannot keep up with all the rules after circa 25 years of ecosystem evolution (less for C#).

gumby · on April 3, 2023

If you care, just use the + operator which is unambiguous.

cornstalks · on April 3, 2023

> Starting in C++17, a[b] always evaluates a before evaluating b.

Okay, I'll bite. Why did C++17 specify this?

olliej · on April 3, 2023

Better question: why did C and C++ not define this as left to right (the answer is perf claims decades ago, that are not really valid now, and questionable even then).

The example

   a[b]

doesn't really exhibit the unexpected, but as I understand it

   f()->a[b()]

could evaluate as

   temp = b()
   f()->a[temp]

which it seems reasonable to consider unexpected. The lack of left to right sequencing here seems not-dissimilar to the lack of sequencing in call arguments, which was also realized to be a needless footgun and fixed.

TremendousJudge · on April 3, 2023

Half of the new features of "modern" C++ editions leave me agreeing with the change but wondering what they were thinking before. Another example: std::map::contains was introduced in C++20; why did they go decades thinking it wasn't necessary to have such an operator (instead providing only count() even though a map has unique keys), and why did they change their minds only now? Did the old guard literally die off or something?

olliej · on April 3, 2023

The historical lack of sequencing is because they were unsequenced in C, and C++ thus inherited it.

And yes, it does seem that there were a lot more people hell bent on resisting correcting unnecessary UB in the past than today, but they’re still around.

TremendousJudge · on April 3, 2023

I get it, but then, why change it now? (or 6 years ago I guess)

olliej · on April 3, 2023

Sorry, I had an unsaved edit above. It seems the resistance to removing unnecessary UB has reduced over the years.

tubs · on April 3, 2023

Problem with contains is we are now going to see more code like

    if (map.contains(foo)) {
       bob(map[foo]);
    }

over the (vastly uglier) more efficient:

   if (auto it = map.find(foo); it != map.end()) {
       bob(*it);
   }

Of course languages like C# manage this in a more elegant way with out parameters declarable at the call site.

JonChesterfield · on April 3, 2023

A pity the c++ compiler has no way to recognise calls to the c++ library in order to do rewrites like that automatically. It's CSE at the stdlib level, should definitely be possible to do that.

eklitzke · on April 3, 2023

Hard coding behavior like for the STL that seems pretty questionable, especially given that std::map and std::unordered_map have poor performance compared to other alternatives (e.g. absl::btree_map and absl::flat_hash_map, and likewise folly has better implementations).

im3w1l · on April 4, 2023

if let is such a nice way to express this

    if let Some(x) = map.get(foo) {
        bob(x)
    }

One subtle but noteworthy thing is that the idiom only mentions map by name once, which is nice if you have a very long map name, and prevents copy paste bugs where you only update one of the two mentions.

olliej · on April 4, 2023

This pattern exists in C++ as well. The specific issue here is that all these STL APIs are in terms of C++'s iterator model, and can't be replaced with a more modern "optional" style that allows this cleaner coding style :-/

paulddraper · on April 4, 2023

Scala:

    for (x <- map.get(foo)) {
      bob(x)
    }

    // or

    map.get(foo).foreach(bob)

    // or

    map.get(foo) match {
      case Some(x) => bob(x)
    }

PaulDavisThe1st · on April 3, 2023

  std::map::find (item) != map.end()  <=> std::map::contain (item)

zimpenfish · on April 4, 2023

I think the problem is that with `std::map::find(item)` you get the value back but you don't with `std::map::contain(item)` - which means a second lookup to actually retrieve the value, no?

eklitzke · on April 3, 2023

Because otherwise the evaluation order of the following construct is undefined, and you can get different results if you compile your code with different compilers:

  foo()[bar()];

In nearly all cases it won't matter, but it can matter if foo() and bar() have some kind of shared state (or affect each other's return values, but seriously don't do that). In general it's better to have a defined evaluation order unless there's some compelling reason not to.

Dylan16807 · on April 4, 2023

> In general it's better to have a defined evaluation order unless there's some compelling reason not to.

That is not the stance C++ has taken in the past, so the question of why they changed it is still unanswered.

addaon · on April 4, 2023

But both C and C++ are comfortable with *(foo() + bar()) having undefined evaluation order, so foo()[bar()] potentially having order-dependent results is clearly not sufficient reason for this decision...

ynik · on April 4, 2023

To allow for chaining. Consider a statement like:

    a.f(foo(1)).g(foo(2)).arr[foo(3)].h(foo(4));

In C++14, the foo() calls happened in unspecified order. C++17 guarantees that "a(b)" and "a[b]" evaluate "a" first, which effectively means the foo() calls will now happen in order (1 to 4).

tpoacher · on April 4, 2023

Before reading the article, I thought he was talking about negative indices.

I was reading K&R the other day and spotted the bit where they mention c supports negative indices and gave an example. My mind was blown.