> memory-based optimizations should not be permitted when two pointers have been converted to/from integers.
Just to be clear, what you are saying is that it is not legal to transform this function:
int foo() {
int x = 5;
bar();
return x;
}
into this function:
int foo_opt() {
bar();
return 5;
}
And the end result of disallowing that kind of optimization is that effectively any function that transitively calls an unknown external function [1] gets -O0 performance. Note that one of the memory optimizations is moving values from memory to registers, which is an effective prerequisite of virtually every useful optimization, including the bread-and-butter optimizations that provide multiple-× speedups.
I suspect many people--including you--would find such a semantics to have too wide a blast radius. And if you start shrinking the blast radius, even to include such "obvious cases" as address-not-taken, you introduce some notion of pointer provenance.
That's how we arrived at the current state. We've given everything the "obvious" semantics, including making the "obvious" simplifying assumptions (such as address-not-taken not being addressable by pointers). But we have discovered--and this took decades to find out, mind you--that using "obvious" semantics causes contradictions [2].
[1] An unknown external function must be assumed to do anything that it is legal to do, so a compiler is forced to assume that it is converting pointers to/from integers. And if that is sufficient to prohibit all memory-based optimizations, then it follows that calling an unknown external function is sufficient to prohibit all memory-based optimizations.
[2] And just to be clear, this isn't "this is breaking new heroic optimizations we're creating today", this is a matter of "30-year-old C compilers aren't compiling this code correctly under these semantics."
I would say that that transformation is legal, it doesn't even involve pointers. I don't think anything I said precludes escape analysis. How would provenance come into play here?
void bar() {
int y = 0;
int *py = &y;
uintptr_t scan = (uintptr_t)py;
while (1) {
scan ++;
char *p = (char*)scan;
if (p[0] == 5 && p[1] == 0 && p[2] == 0 && p[3] == 0) {
*(int*)p = 3;
break;
}
}
}
This code will scan the stack looking for an int whose value is 5 and replacing it with 3. It's only undefined behavior if there's some notion of provenance: there's no pointer arithmetic, it only happens without pointers. There's not even a strict aliasing violation (since char can read anything). And yet, this code is capable of changing the value of x in foo to 3.
> I don't think anything I said precludes escape analysis. How would provenance come into play here?
Could another approach be taken, where local variables are considered implicitly “register”? In that case this simple example has no problem whatsoever. It does arise unnecessarily if the address of a local is taken but does not escape, but that ought to be rare.
Just to be clear, what you are saying is that it is not legal to transform this function:
into this function: And the end result of disallowing that kind of optimization is that effectively any function that transitively calls an unknown external function [1] gets -O0 performance. Note that one of the memory optimizations is moving values from memory to registers, which is an effective prerequisite of virtually every useful optimization, including the bread-and-butter optimizations that provide multiple-× speedups.I suspect many people--including you--would find such a semantics to have too wide a blast radius. And if you start shrinking the blast radius, even to include such "obvious cases" as address-not-taken, you introduce some notion of pointer provenance.
That's how we arrived at the current state. We've given everything the "obvious" semantics, including making the "obvious" simplifying assumptions (such as address-not-taken not being addressable by pointers). But we have discovered--and this took decades to find out, mind you--that using "obvious" semantics causes contradictions [2].
Take a look at the quiz that prompted this discussion: <https://www.cl.cam.ac.uk/~pes20/cerberus/notes50-survey-disc...> and <https://www.cl.cam.ac.uk/~pes20/cerberus/notes30.pdf>. There's a lot of cases where pointer provenance crops up that asking C experts "does this work" or "should this work" ends up with head-scratching. Indeed, if you compare several claimed formal semantics of C, there are several cases where they disagree.
[1] An unknown external function must be assumed to do anything that it is legal to do, so a compiler is forced to assume that it is converting pointers to/from integers. And if that is sufficient to prohibit all memory-based optimizations, then it follows that calling an unknown external function is sufficient to prohibit all memory-based optimizations.
[2] And just to be clear, this isn't "this is breaking new heroic optimizations we're creating today", this is a matter of "30-year-old C compilers aren't compiling this code correctly under these semantics."