To clarify my position, I am not advocating for the situation not to be clarified. What I am advocating is that the resolution to be that pointer provenance be severely limited: roughly speaking, pointers casted to/from integers might alias and memory-based optimizations should not be permitted when two pointers have been converted to/from integers. I further claim that something like this is the only possible way to resolve the ambiguity that is consistent with the charter. Additionally, getting rid of pointer provenance (or equivalently: well-defining the behavior involved with various integer/pointer conversions) is the only way to, as you want, "minimize the amount of user code that is broken", because much user code assumes the integer/pointer conversion happens according to what TFA calls concrete semantics.
I dispute that there is a "need" for pointer provenance, so much as a desire on the part of compiler developers to honor the sunk cost of various optimizations that relied on particular interpretations of the ambiguity.
> memory-based optimizations should not be permitted when two pointers have been converted to/from integers.
Just to be clear, what you are saying is that it is not legal to transform this function:
int foo() {
int x = 5;
bar();
return x;
}
into this function:
int foo_opt() {
bar();
return 5;
}
And the end result of disallowing that kind of optimization is that effectively any function that transitively calls an unknown external function [1] gets -O0 performance. Note that one of the memory optimizations is moving values from memory to registers, which is an effective prerequisite of virtually every useful optimization, including the bread-and-butter optimizations that provide multiple-× speedups.
I suspect many people--including you--would find such a semantics to have too wide a blast radius. And if you start shrinking the blast radius, even to include such "obvious cases" as address-not-taken, you introduce some notion of pointer provenance.
That's how we arrived at the current state. We've given everything the "obvious" semantics, including making the "obvious" simplifying assumptions (such as address-not-taken not being addressable by pointers). But we have discovered--and this took decades to find out, mind you--that using "obvious" semantics causes contradictions [2].
[1] An unknown external function must be assumed to do anything that it is legal to do, so a compiler is forced to assume that it is converting pointers to/from integers. And if that is sufficient to prohibit all memory-based optimizations, then it follows that calling an unknown external function is sufficient to prohibit all memory-based optimizations.
[2] And just to be clear, this isn't "this is breaking new heroic optimizations we're creating today", this is a matter of "30-year-old C compilers aren't compiling this code correctly under these semantics."
I would say that that transformation is legal, it doesn't even involve pointers. I don't think anything I said precludes escape analysis. How would provenance come into play here?
void bar() {
int y = 0;
int *py = &y;
uintptr_t scan = (uintptr_t)py;
while (1) {
scan ++;
char *p = (char*)scan;
if (p[0] == 5 && p[1] == 0 && p[2] == 0 && p[3] == 0) {
*(int*)p = 3;
break;
}
}
}
This code will scan the stack looking for an int whose value is 5 and replacing it with 3. It's only undefined behavior if there's some notion of provenance: there's no pointer arithmetic, it only happens without pointers. There's not even a strict aliasing violation (since char can read anything). And yet, this code is capable of changing the value of x in foo to 3.
> I don't think anything I said precludes escape analysis. How would provenance come into play here?
Could another approach be taken, where local variables are considered implicitly “register”? In that case this simple example has no problem whatsoever. It does arise unnecessarily if the address of a local is taken but does not escape, but that ought to be rare.
I dispute that there is a "need" for pointer provenance, so much as a desire on the part of compiler developers to honor the sunk cost of various optimizations that relied on particular interpretations of the ambiguity.