Hacker News new | past | comments | ask | show | jobs | submit login

To clarify my position, I am not advocating for the situation not to be clarified. What I am advocating is that the resolution to be that pointer provenance be severely limited: roughly speaking, pointers casted to/from integers might alias and memory-based optimizations should not be permitted when two pointers have been converted to/from integers. I further claim that something like this is the only possible way to resolve the ambiguity that is consistent with the charter. Additionally, getting rid of pointer provenance (or equivalently: well-defining the behavior involved with various integer/pointer conversions) is the only way to, as you want, "minimize the amount of user code that is broken", because much user code assumes the integer/pointer conversion happens according to what TFA calls concrete semantics.

I dispute that there is a "need" for pointer provenance, so much as a desire on the part of compiler developers to honor the sunk cost of various optimizations that relied on particular interpretations of the ambiguity.




> memory-based optimizations should not be permitted when two pointers have been converted to/from integers.

Just to be clear, what you are saying is that it is not legal to transform this function:

  int foo() {
    int x = 5;
    bar();
    return x;
  }
into this function:

  int foo_opt() {
    bar();
    return 5;
  }
And the end result of disallowing that kind of optimization is that effectively any function that transitively calls an unknown external function [1] gets -O0 performance. Note that one of the memory optimizations is moving values from memory to registers, which is an effective prerequisite of virtually every useful optimization, including the bread-and-butter optimizations that provide multiple-× speedups.

I suspect many people--including you--would find such a semantics to have too wide a blast radius. And if you start shrinking the blast radius, even to include such "obvious cases" as address-not-taken, you introduce some notion of pointer provenance.

That's how we arrived at the current state. We've given everything the "obvious" semantics, including making the "obvious" simplifying assumptions (such as address-not-taken not being addressable by pointers). But we have discovered--and this took decades to find out, mind you--that using "obvious" semantics causes contradictions [2].

Take a look at the quiz that prompted this discussion: <https://www.cl.cam.ac.uk/~pes20/cerberus/notes50-survey-disc...> and <https://www.cl.cam.ac.uk/~pes20/cerberus/notes30.pdf>. There's a lot of cases where pointer provenance crops up that asking C experts "does this work" or "should this work" ends up with head-scratching. Indeed, if you compare several claimed formal semantics of C, there are several cases where they disagree.

[1] An unknown external function must be assumed to do anything that it is legal to do, so a compiler is forced to assume that it is converting pointers to/from integers. And if that is sufficient to prohibit all memory-based optimizations, then it follows that calling an unknown external function is sufficient to prohibit all memory-based optimizations.

[2] And just to be clear, this isn't "this is breaking new heroic optimizations we're creating today", this is a matter of "30-year-old C compilers aren't compiling this code correctly under these semantics."


I would say that that transformation is legal, it doesn't even involve pointers. I don't think anything I said precludes escape analysis. How would provenance come into play here?


Suppose I invent bar as follows:

  void bar() {
    int y = 0;
    int *py = &y;
    uintptr_t scan = (uintptr_t)py;
    while (1) {
      scan ++;
      char *p = (char*)scan;
      if (p[0] == 5 && p[1] == 0 && p[2] == 0 && p[3] == 0) {
        *(int*)p = 3;
        break;
      }
    }
  }
This code will scan the stack looking for an int whose value is 5 and replacing it with 3. It's only undefined behavior if there's some notion of provenance: there's no pointer arithmetic, it only happens without pointers. There's not even a strict aliasing violation (since char can read anything). And yet, this code is capable of changing the value of x in foo to 3.

> I don't think anything I said precludes escape analysis. How would provenance come into play here?

Escape analysis is a form of pointer provenance.


> I would say that that transformation is legal, it doesn't even involve pointers.

you don't know that void b() isn't implemented as

    void b() {
       int* ptr = make_a_valid_pointer_from_an_integer(1638541351);
       *ptr = 10;       
    }
with 1638541351 sometimes being the address of x above ?


Could another approach be taken, where local variables are considered implicitly “register”? In that case this simple example has no problem whatsoever. It does arise unnecessarily if the address of a local is taken but does not escape, but that ought to be rare.


Great example, can I please borrow that? Is a link sufficient attribution?


Go ahead.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: