More specifically Aria's Experiment (the provenance APIs) was stabilized in 1.84 which actually shipped as a release only last month, January.
Socializing these APIs is important = if you write or review unsafe Rust, especially unsafe Rust which does stuff with pointers, knowing these APIs exist makes it less likely you will write or wave through unsound software that treats integers as having provenance, sometimes you would get away with that, other times you would not, good Rust programmers shouldn't let other Rust programmers use a usize where a pointer was the correct type, just as you wouldn't use u32 where IPv4Addr is the correct type.
I found provenance easiest to think about after learning CHERI, since there the provenance isn't just an abstract concept that allows better optimisation; it's actually real data in the capability that affects whether it works at all, irrespective of optimisations. You literally can't cast an integer to a pointer, even in assembly.
Well, you can't "just" cast any of Rust's integers to a pointer because they're not necessarily the same size (one of the changes to Rust is the decision that usize is not the same size as a pointer only the same size as the pointer address), but C++ uintptr_t is in fact guaranteed to be the same size and presumably the compiler might just model it as an address the same as a pointer.
Today there are no settled provenance rules for C or C++ but there is a TS for C explaining PNVI-ae-udi. Under that model if we cast a pointer to uintptr_t we're exposing the provenance of that pointer. In Rust's Exposed Provenance API we're not promised that this is even possible on every target (if we imagine compiling Rust for a CHERI target clearly expose_provenance may not compile to any actual code, maybe the compiler barfs) but in C we're guaranteed the actual raw bits so we're allowed to, for example, display them to the user and under this TS the associated provenance is thereby exposed - so we can have the user later input an arbitrary integer, turn it into a pointer and the resulting pointer must work if they gave us the right number and that pointer would still be valid!
Is that a good idea? No. But it's very on brand for C and might become the de facto or even de jure semantic for pointer provenance in C++ too.
> Well, you can't "just" cast any of Rust's integers to a pointer because they're not necessarily the same size (one of the changes to Rust is the decision that usize is not the same size as a pointer only the same size as the pointer address
You can though, in practice on a non-CHERI system and if you avoid optimisations.
I haven't looked but I would expect the change you mentioned was made precisely because of CHERI, where a pointer and address are different sizes. On all other platforms Rust supports they are the same.
The reason this is a really good idea is because is sometimes needed in the real world and with PNVI-ae-udi is just works. Also the alternative models have horrible semantics, e.g. either because you have provenance via integers (where algebraic properties of mathematics breaks) or because you get really subtle semantics that is hard to reason about (all the angelic / demonic indeterminism ideas).
Are you telling me that, if you could start over somehow, you'd suggest K&R should specify PNVI-ae-udi ? Or do you mean (which I accept) only that well, too bad, PNVI-ae-udi is the best we could do with C given where we started from 20+ years ago?
I actually think there's yet significant risk that this is swamped by the "Bag of bits" people despite all your hard work. There are plenty of WG21 submissions whose subtext can be read that way, they want their "high level assembler" and they don't care that it's a myth and can't be made real, they're not taking "No" for an answer. So, congratulations on the TS.
I actually think PNVI-ae-udi is pretty good semantic model for a systems programming language, considering that it is the essentially only reasonable model on how to combine abstract pointers with low-level integer based pointers in a way that makes it work well. I am not entirely sure I would allow one-after points if I were to design a language from scratch, but I haven't made up my mind about it.
While it makes sense that you landed on PNVI-ae-udi for C because of its prior baggage, I don't see how this makes a good choice for a language without that baggage.
For one thing, I'm pretty sure systems languages don't need to combine these two ideas at all. In C there's a single pointer type and so it has both uses too bad, but in a hypothetical new language you don't need to do that and I think it would be clearer not to.
The good news is that you already listed the two separate ideas. In C you need to have a single type which embodies both contradictory ideas, but that's actually a choice, albeit one it's far too late for C to change now.
The fact that choices you made in the past can't subsequently be changed is baggage. Even if you decide tomorrow that this was a waste of your time and you should never have worked on the provenance problem, you can't have that time back. [that's an example of personal baggage, in the case of C the early choice to only have the single type for both purposes is the baggage]
> ...another pass might notice the INT_MAX+1 and replace it by unreachable since UB code is “by definition” unreachable...
I would love to see the concrete example where this actually makes the program faster, to give me a better understanding of what I'm up against if I prefer to make integer field arithmetic well-defined.
AFAIK, the main reason these loops are so frustrating in C is because no one wanted to bite the bullet and insist that fast loops all use size_t instead of int--as otherwise on systems where int isn't native (which is, today, most of them) you have to emulate the wrap by adding a mask to the iteration--and so the compilers started inferring stuff and the language spec got damaged to let them work around it instead.
For what it's worth rust has made all integer arithmetic well-defined unless you explicitly opt in to the alternative on a case by case basis. Opting into the alternative is not something I've seen people having to do for performance.
Specifically in release builds it's defined to be 2s complement arithmetic. In debug builds it's defined to panic (in non rust speak that is basically throw an exception) on overflows, and the language theoretically reserves the right to change the behavior on release builds to do the same, but that hasn't been done because the exception route does actually incur a non-negligible performance penalty.
I agree with you that I don't see people opting in for performance, but it's also the case that C relies on this behavior a lot more than Rust does, since it doesn't have iterators. If more Rust programmers wrote "raw" for loops, maybe the overhead would be more significant. I don't know if there's any way to qualify this.
Also, it' not done by default because it's believed to incur the penalty. We didn't try to do any significant analysis here, mostly relying on the widespread belief. If we wanted to go against the grain, we'd have to demonstrate that it's incorrect, and that's difficult, time consuming, and there's other things to spend time on.
Google recently demonstrated pretty convincingly that bounds checks on array access in their C and C++ code has a very minor penalty that's absolutely worth it. I wonder if maybe we'll see something similar with this semantic in the future. I think it's lower priority because it's not a safety issue on its own to have the "maybe this, maybe that" behavior, since it's not undefined behavior.
I would imagine it wouldn't be too hard to collect this information because you can just change the setting on a project and report back whether that's no significant difference, broken (indicating your Rust is wrong, you should fix that) a little slower or unacceptably slower.
Actually that information might help inform people choosing this setting for a new project. I think many assume that Wrapping is the conservative choice but actually the Strict / panic semantic is going to be better for at least some of those people if they could afford it and so if they knew that say 60% of projects find this works out they might try it and benefit.
As a side effect a handful of people get to find out that their project has a nasty bug where it actually relies on wrapping for routine functionality but does not correctly use a wrapping type.
Also, for what it's worth, it's easy to change, if you're building with cargo (basically everyone does)
[profile.release]
overflow-checks = true
There have been a few projects where I've set this for one reason or another (e.g. because I'm dealing with intentionally wrapping arithmetic and I want the program to shout if I make a mistake).
Just in case you are saying this to defend why you are using "int" instead of "size_t" (as, otherwise, I am not sure of your point and a I'll hope you might clarify it), you would then want to use "ssize_t"... the issue I was referring to is not about signed vs. unsigned: it is about using a type which is smaller than a native full-width machine number type (such as when int is 32-bit on a 64-bit machine), as you then have to use suboptimal instructions or even add more code to emulate the wrapping semantics.
I understood your comment to be about well-definedness of wraparound and my point is that I do not want it to be well-defined for signed integers because it would take a tool away from me for catching bugs. This is true independent of the size of the type. For similar reasons I use int and not ssize_t because I want an error in my lifetime if there is a bug that causes my loop not to terminate. (And yes, sometimes int would be too small.)
Otherwise, there are many examples where signed integers lead to better code, here is one: https://godbolt.org/z/szGd1864a
I am sorry, I am missing something here... are you saying that you are using a smaller type "int" so that it wraps sooner? AFAIK, the compiler is going to use the fact that wrap is undefined to promote that int to a native type and make it not wrap until the same place ssize_t would!
OK, and yeah: based on your godbolt example, I do feel like you still don't understand all the ramifications, as you are relying on a bunch of undefined behavior to make your loop sort of work just because you want to avoid working with field arithmetic. Fixing the code is just as fast:
int f(unsigned long n)
{
int sum = 0;
for (unsigned long k = n; k != n + 4; ++k)
sum += k;
return sum;
}
f(unsigned long):
lea eax, [6+rdi*4]
ret
(Of course this is still a bit awkward, as this function should also clearly return an unsigned long. I am just trying to show you the minimal patch to the loop to demonstrate why you aren't buying anything... there is no benefit to this quirk of the compiler: please fix the code.)
(edit: BTW, I just looked at your about, and the fact that you are or were on the C WG is demoralizing, as I'd have thought this kind of confusion would be reason to rid this behavior from the spec... but C developers really seem to love doing signed inequalities on field elements :/.)
But using int doesn't wrap sooner or cause any errors at all, because it is automatically promoted to the native size due to this all being undefined behavior. Your example even directly demonstrates that, as the code had no chance of erroring in any way as it just returned the sum directly... the sum is even overflowing the int it is being stored into, that's how crazy the behavior of the code you wrote is.
And like, in addition to undermining your argument, that is also distressing, right? The code you wrote can be interpreted in a ton of different ways and the only reason the fast one makes any sense is if you squint hard enough so as to lose the details of the sizes and behaviors of all of the types that were declared... if you actually had to make the wrapping or errors you wish to see happen you'd have to have the slow code.
In contrast, the code using a native-sized unsigned type and != means only one thing, and that one thing is the fast code you said wanted to get... that makes it the correct code. It isn't optimized to that code due to undefined behavior: mathematically that lea multiplication is identical, and if you simulated all of the steps--including the wrap (and if we fixed the output type, the truncation)--you'd always get that result.
(And the reason I keep using the term "field" is because these types are elements of a modular finite field, and once you see them for what they are it becomes a lot easier to write correct code using them. There are even some modern languages where the native algebraic type is an element of a prime field, at which point confusing it for some kind of integer is extremely problematic.)
Let's start with something simple: Those types are not fields but quotient rings. If you have learned enough math to understand this, please come back.
As a C program, it doesn't actually work. [0] It's just C-as-pseudocode for LLVM IR. And LLVM has different semantics for what is allowed and denied, than C.
The only reason it doesn't work is because the stack grows downwards on x86 targets so p is at a higher address than q. If you swap the variables round, it compiles and runs printing the value assigned.
As the compiler is free to reorganise the variable order, that means you're relying on Undefined Behaviour for it to compile and do what you want. That's not something inherent to the way pointers work.
1 A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.
...
6 Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.
The issue for your code is that on line 2 you cast to uintptr_t the pointer (p+1) to type char, which is not a pointer to void (and not a null pointer). So the result of your conversion is implementation-defined and the result need not be in the range of values of any integer type. Thus an implementation can assume that the result of the comparison (iq == ip) is always false and even omit the entire if-block.
> The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
> uintptr_t
Given this, are you suggesting that the code would be valid if the char pointer was first cast to void as so:
char p[1], q[1] = {0};
uintptr_t ip = (void *)(p+1);
uintptr_t iq = (void *)q;
if (iq == ip) {
*(char *)(void *)iq = 10;
print(q[0]);
}
Additionally, it seems the code may be valid as-is:
> A pointer to void shall have the same representation and alignment requirements as a pointer to a character type.
You still have a few bugs to fix (need to cast to uintptr_t after casting to void *, need to use printf with a proper format string) but now your code is valid. I believe it's still implementation-defined because different implementations are free to lay out memory as they wish, say by including the use of padding bytes between values, or by changing the location of the stack and the direction it grows (up or down in memory). This means you cannot assume that the array q directly follows the array p in memory (they may be in the opposite order or they may be separated by padding bytes).
I created a working version of your code on Godbolt Compiler Explorer [1]. I tried it with a few different compilers and the code runs. The comparison evaluates to false on every compiler that I could get to work at all. Some don't seem to print anything at all, though the assembly they generate seems like it ought to.
As a general rule of thumb for C, you should assume it is impossible to reach the object pointed to by one pointer via another pointer, unless the second pointer directly aliases the first. You also cannot assume that the integer representation of a pointer corresponds directly to a valid address in memory. Virtual memory (with address translation), fat pointers, segmented memory, etc. are all possible which means the pointer's actual bit pattern should be considered an opaque structure handled by the implementation, not an integer for you to manipulate and then cast back to a pointer.
The main reason for wanting to convert a pointer into a uintptr_t type is to serialize it into some other structure in a one-way fashion, such as to store it in a hash table.
This code doesn't assume that q follows p, that's the trick to it. The way you are trying to explain this to me, I feel like you haven't read the article or understood the code snippet.
After (re)reading the article I've come to the conclusion that with proper provenance tracking through the cast to an integer, the comparison:
(iq == ip)
Should be replaced (at compile time) with the constant 0 (or false, if you like). Since iq and ip have different provenances -- even though they may (or may not) be bitwise equal (which is entirely implementation-dependent in C) -- they can never be considered equal, so falsity can be deduced at compile time.
It should also be considered invalid (and a compile-time error) to cast any old plain integer (one whose provenance is not a pointer) to a pointer, regardless of whether or not the address of said pointer lands on a live object, because an integer created by any means other than a cast from a pointer should have empty provenance.
Edit: Ahhh, this provenance story is way messier than I thought. One serious problem with tracking provenance is that functions like memcpy become a huge headache to deal with.
Strict aliasing isn't the root problem here, because Rust, which doesn't have C-style strict aliasing, would similarly exhibit the same theoretical miscompilation in the absence of provenance.
It seems to me like the final optimization is the only one that is not correct and obviously so.
> The final optimization notices that q is never written to, so we can replace q[0] by its initial value 0.
However the line right above is
*(p+1) = 10;
The type of p is char[] which decays into a pointer and the type of (p+1) is char *. Because it is char * and not char * restrict, any assignment through such a pointer can be assumed to modify any and all data in a program.
Therefore, the assumption that p[0] does not change and can be swapped out is not somehow surprisingly wrong, but simply entirely unfounded.
It is up to the compiler to prove for itself that assignment through pointers does not change data it's interested in from before to after the assignment. (except, of course, in the case of restrict pointers, where the programmer makes this guarantee). This is not very difficult, at least conceptually.
If the assignment was
*p = 10;
Well, we know that *p means p[0], which is within the bounds of the array, and we know that p and q do not overlap. Therefore, this assignment can not modify q.
In the case of the first assignment, we know that *(p+1) means p[1], which is out of bounds for the array. So we do not know whether or not an assignment through *(p+1) modifies q, and we can not assume it does not.
It's not always correct, but that's kinda the tradeoff with undefined behavior. It's your responsibility to know the defined spec and stay within it, not the compiler's to prove you did it correctly in all cases.
>Since q and p point to different local variables, a pointer derived from p cannot alias q[0], and hence we know that this write cannot affect the value stored at q[0].
Pointer math outside the array bounds isn't defined, so it can be treated as anything, including "cannot possibly do X, so we can optimize it out". A compiler that preferred to spend memory to improve correctness (e.g. Valgrind) might add padding between variables to prevent this kind of small-OOB action, which would make the output change again, which is why this is undefined and you can't rely on it. A compiler that was feeling moody could put that array at the end of your page of addressable memory, and cause a segfault when you try to write to it.
In the actual code *(p+1) is only ever accessed if it has been verified to be the same address as q[0], which is an object which we know exists and we have full access to.
The problem is that pointers of the same type to the same address but originating from pointer arithmetic on the pointers obtained by dereferencing different objects have different semantics.
In other words, the following is not semantically a NOP.
I believe this goes against the principle of least astonishment (I find it quite astonishing, personally). I believe, even if the standard not does not guarantee the expected behavior, that implementations should still guarantee it. And they can do the optimizations they want in other ways, as I have outlined.
While I generally agree and I much prefer my langs with minimal undefined behavior, this gets solidly into Hyrum's Law territory, and it means practically every significant optimization could break something therefore it cannot be done. C just doesn't have rich enough semantics to allow much else.
You can find some languages that try to adhere to that, but C is extremely clearly not part of that group. It never has been, and it never will be.
The problem with C is not necessarily the idea of undefined behavior. It's the culture around undefined behavior. There is no culture of avoiding UB in a programming language community that glorifies UB as necessary. That's the paradox.
It's like you're supposed to cross an enemies' minefield, with the enemy laying ever more mines in your path and yet you're supposed to walk through the minefield and hope for the best.
You'd have thought that C programmers have the best tools and techniques to avoid UB by now, so that UB would be a non-issue, a relic of the past that people remember suffering from in the 90s.
> The type of p is char[] which decays into a pointer and the type of (p+1) is char . Because it is char and not char * restrict, any assignment through such a pointer can be assumed to modify any and all data in a program.
Out of bounds access is classic UB. The compiler is allowed to assume that you never access an array out of bounds. Since you never access p out of bounds, the effects of any write to p will not affect any "object" that is not p.
>Therefore, the assumption that p[0] does not change and can be swapped out is not somehow surprisingly wrong, but simply entirely unfounded.
I hope you notice your small mistake. It's print(q[0]), not print(p[0]), there is no "p" in the print. It's q[0] and q[0] is not p. The write to p did not affect q, because that would be UB.
>It is up to the compiler to prove for itself that assignment through pointers does not change data it's interested in from before to after the assignment. (except, of course, in the case of restrict pointers, where the programmer makes this guarantee). This is not very difficult, at least conceptually.
The code accesses q[0], not iq. You might say that if you did print(*iq), that the compiler should definitively know that iq and ip are the same and therefore a write to ip would also be a write to iq, but the code uses q[0], which is a memory address that is not proven to be identical to ip.
Then again, what you're hoping for here is that the compiler fixes the UB for you, which is not the point of UB. You're supposed to rely on your godly C programmer instincts to never write code with UB in the first place.
This statement "any assignment through such a pointer can be assumed to modify any and all data in a program." is incorrect. You are not allowed to use a pointer to one object to modify another. At least this is what the C standard (somewhat imprecisely) implies and compilers generally exploit for optimization.
Most people agree that the right solution is that the integer-round trips can not simply be assumed to be a no-op and has the effect that the recreated pointer can be used to point to the other object (as proposed by us in TS 6010). (So in this sense you are right). The LLVM maintainer stated recently that they are going to fix this.
> You are not allowed to use a pointer to one object to modify another.
In this context, you means the author of a program written in Standard C. Implementations can choose to allow this (and in practice all do, because of the nature of pointers). And obviously, a code transformer for a particular implementation of C would only need to generate code valid for that implementation of C, not valid Standard C code.
True, an implementation could choose to allow this. But in is a bit questionable to say they do in practice, because most will miscompile your code in this case in various circumstances.
I have some questions. First, do pointers that do not come from specifying a statically sized array also have provenance? That is if I say `char* c; size_t n = get_user_input(); c = malloc(n);` does c have provenance? If after the above I created an array like so `char str[n]` and then assigned such that it pointed to the same region of memory that malloc() returned, then would the provenance of both pointer be equal?
Also, in Rust why are pointer to integer casts allowed at all if they lose provenance?
Also, why exactly is one past end of bounds a valid pointer you can calculate?
Also, it seems like you cannot just create a random pointer to an unallocated address, not just that you can’t write to it. Is that correct?
Lastly, is provenance just a fancy way of saying metadata? Is there a reason something like “pointer metadata” or “object metadata” wasn’t used?
>That is if I say `char* c; size_t n = get_user_input(); c = malloc(n);` does c have provenance?
Yes.
>If after the above I created an array like so `char str[n]` and then assigned such that it pointed to the same region of memory that malloc() returned, then would the provenance of both pointer be equal?
Yes.
>Also, in Rust why are pointer to integer casts allowed at all if they lose provenance?
It would be a breaking change to remove the ability to cast. Instead Rust added API to extract the provenance when casting from ptr to int and then add provenance back when casting from int to ptr.
>Also, why exactly is one past end of bounds a valid pointer you can calculate?
It's useful to allow it to make `if (ptr < end)` with exclusive `end` possible. The alternative of only allowing `if (ptr <= end)` with inclusive `end` is not as trivial if it needs to handle empty ranges (the exclusive version can just use `end = start` to denote an empty range).
>Lastly, is provenance just a fancy way of saying metadata?
Pointer metadata in Rust at least refers to a different thing, namely the second half of fat pointers. For slice pointers the fat pointer contains data ptr and length, so the length is the metadata. For trait object pointers the fat pointer contains data ptr and vtable ptr, so the vtable ptr is the metadata.
> It's useful to allow it to make `if (ptr < end)` with exclusive `end` possible. The alternative of only allowing `if (ptr <= end)` with inclusive `end` is not as trivial if it needs to handle empty ranges (the exclusive version can just use `end = start` to denote an empty range).
As an aside: I find it fascinating that half-open ranges turn out to be - universally as far as I can tell - the right choice for ranges in software. As well as being able to store an empty range, they also allow you to calculate the length of the range by trivially subtracting end - start.
For the growable array and slice types (C++ std::vector and std::span) at a low level we should actually store ptr and length not start and end, it makes a tiny but measurable performance difference.
This is one of the tiny perf leaks from C++ ABI conservation. IIRC All three popular std::vector implementations got this wrong decades ago and are now ABI frozen so too bad. The design of std::span is correct though for whatever that's worth.
Storing the length instead of the address past the last element does not change anything in the answer to the question posed by the previous poster.
The length of an array is the index of an element beyond the limit of the array, which would be invalid in an indexed expression for accessing an array element.
Exactly like when using pointers the address beyond the last element of the array is useful, when using indices the length, i.e. the index beyond the last element of the array, is useful.
Moreover, regardless how the arrays are handled in a high-level programming language, on CPU architectures that provide SIB (scaled index + base) addressing modes instead of auto-incremented/auto-decremented addressing modes, e.g. on x86-64, there are frequently encountered cases when the optimal implementation of a loop in machine instructions requires a base register initialized to the address of the element beyond the array limit, together with an index register that contains negative indices, which is incremented at each iteration until it becomes non-negative, in order to access the array in increasing order.
(Positive indices that are decremented until becoming negative, together with a base register containing the address of the first array element, are used when the arrays must be accessed in decreasing order.)
In optimal implementations on x86-64 and similar ISAs it does not matter much which value between length and address past the array you choose to store in an array descriptor. When the loop is initialized in machine instructions you may need frequently the third value that is not stored, but that is pretty much irrelevant, because an extra addition or subtraction is usually done in zero time on modern CPUs (i.e. concurrently with the other instructions that would have been executed anyway, even when the extra addition/subtraction would not have been necessary).
Because arrays are more frequently accessed in increasing order, even when that is not necessary, for the purpose of just accessing the array it would be optimal to store in the array descriptor the length together with the address of the element beyond the array. However this would be worse in conjunction with legacy APIs that would expect the address of the first element, which would require an extra subtraction, and also when accessing just an array element, not the whole array, when an extra subtraction would also be needed.
In any case, on x86-64 storing both addresses instead of the first address with the length does not have any measurable performance difference in correct implementations, because the number of machine instructions needed for implementing a loop is the same in the most frequent cases. If there are performance differences, it is likely that the machine code generation is bad.
> > If after the above I created an array like so `char str[n]` and then assigned such that it pointed to the same region of memory that malloc() returned, then would the provenance of both pointer be equal?
> Yes.
Uh, no. This is flatly untrue. You cannot "assign an array such that it points to a region of memory". Arrays are not pointers, they do not point to anything.
I assumed they meant `char (*str)[5]`, ie a pointer to an array, because they wanted to reinterpret the malloc result as an array of the malloc'd length.
> If after the above I created an array like so `char str[n]` and then assigned such that it pointed to the same region of memory that malloc() returned, then would the provenance of both pointer be equal?
I don't think you can perform such assignment.
> Also, in Rust why are pointer to integer casts allowed at all if they lose provenance?
Sometimes you want the numeric address to perform some arithmetic/bitwise operation on it (e.g. for tagged pointers)
> Also, why exactly is one past end of bounds a valid pointer you can calculate?
Because it's increadibly useful for iterating. It is also used for representing an empty tail slice. It's something inherited from C/C++ and likely won't go away.
> Also, it seems like you cannot just create a random pointer to an unallocated address, not just that you can’t write to it. Is that correct?
You can create such pointer, it just won't be valid.
> Lastly, is provenance just a fancy way of saying metadata? Is there a reason something like “pointer metadata” or “object metadata” wasn’t used?
It's not metadata that exists at runtime. It only exist at compile time to allow certain common optimizations.
>Clearly, one of the three optimizations is incorrect in the sense that it introduced a change in program behavior.
What about declaring that all three optimizations are correct, and this program invokes UB? More precisely, it compares object one-past-the-end pointer with pointer pointing into entirely another object. one-past-the-end pointer is special because numerically it points outside the object. I think declaring that one-past-the-end pointer could be compared only with itself, valid pointer into the same object or NULL wouldn't be too unreasonable (comparing one-past-the-end pointer with pointer to another object would invoke UB).
But it does not. It compares integers. Comparing integers is always well-defined.
If you want to declare it UB, you will have to say that the integers are not simple bits, they also "remember" which pointer they came from. This is essentially what we mean when we say that integers has provenance, and it's not a good idea because it doesn't let you do some very common optimizations on integers.
The downside of defining things like pointer provenance in the IL is that it ties the semantics of your IL to C. If you are writing another language and want to have very different symantics - say a flat memory model where pointers are integer indicies into memory, or definied signed overflow - then you will have to be very careful and constantly have to fight or work around the C semantics. Basically the same downside as when you use C as a compilation target (like for example Nim does/did).
Where possible, undefined behavior at the LLVM IR level is always optional, controlled by various flags, attributes and metadata. Things like signed integer overflow, or even the forward progress guarantee are controlled that way, and admit a wide range of possible language semantics.
Provenance (and to some degree the entire memory model) is the big exception to this. It's a very core part of the IR semantics, and an optimizing compiler without provenance would have to use entirely different methodology to justify many optimizations. I don't believe that anyone has yet been able to demonstrate the feasibility of highly optimizing compiler without provenance.
The IR doesn't have to cater to programmers in the same way, and can support many choices of semantics simultaneously in a single program.
For example, LLVM IR is used to compile C both with and without strict aliasing. The choice is lowered into metadata applied to individual pointer operations. Similarly with signed integer overflow.
Nim could even have gotten some of this benefit when compiling to C if it had been willing to target a specific set of C compilers, rather than standard C.
I could never understand why "pointers are complicated" or "hard"?
They just point at things. Sure there are some semantic quirks if you want to go there, to idk obfuscate the code or to make things more complex than they really are.
I just don't get it.
Been programming in C for decades now and never had any issues with this.
Yes, I've done fair share of these, but I don't see them as different from any other incorrect code I may write. I agree that sometimes it may be more challenging to debug, as it may not be immediately obvious that e.g. I am writing memory I am not supposed to.
I don't see how the proposed solution improves matters. Even without the second optimisation, it seems that the third could still occur and cause issues. I suppose the compiler must only recognise the potential for a pointer writing arbitrary data when it is first cast to an integer.
What this really says is that C giving one-past-the-end pointers meaningful semantics may have been a mistake. So was trying to give aliasing of writable objects meaningful semantics.
> one of the three optimizations is incorrect ... but which one is it?
> The final optimization notices that q is never written to
Isn't it obviously this assumption that's blatantly wrong? You have no clue q is never written to because the address was saved somewhere and you clearly have no idea what happened to the address afterward. It could've obviously been used to write to the variable. Is that so complicated?
This feels like saying "I closed my eyes. I no longer see you. You are clearly not in front of me anymore, so I punch into the air and hit you. Which of my assumptions was wrong?" Such a mystery!
No it wasn't. The address was taken, compared, then discarded, the compiler knows that it's gone out of scope and can't be used again.
> It could've obviously been used to write to the variable. Is that so complicated?
It's arbitrarily complicated, in the general case, and if you can't solve it for the general case then solving it for specific special cases would only make your language semantics more confusing - if the address of q is nowhere accessible then is it safe to treat it as unused? Well, maybe, maybe not, sometimes people convert a pointer to an integer, xor it with a constant, then later xor it back. If the program converts two pointers to integers, then takes the nth root of a^n + b^n, does the compiler have to solve Fermat's last theorem to decide whether that might alias the third pointer or not?
(Or are you saying that any value whose address was taken, ever, even if that address is then discarded, must then be treated as aliasing everything else? That would create massive performance regressions)
What you are saying is that compiler must keep the variable ordering on the stack the same. That is the only way that `&(p+1) == &q` would hold consistently.
Now if we have `char p[1] = {0}; int a; char q[1] = {0};` the compiler cannot move p and q together so that a can properly aligned with less padding (I assume alignment is a concern on stack space as well?). The compiler also cannot do any loop invariant hoisting. It cannot remove any unused variables. Because all of them can change the behavior of the program just be existing. Any random pointer could have been aliasing one of those positions.
Every single optimisation would require full program analysis to determine whether something could possibly alias. And if you do any non trivial pointer arithmetic (after integer casting) or use user input in it, you lose that. Not just in that place but everywhere in the program.
> I do not agree with you. What you are saying is that compiler must keep the variable ordering on the stack the same. That is the only way that `&(p+1) == &q` would hold consistently.
I think we're speaking past each other because that is in no way what I'm saying.
All I said was: there was a memory write... to an address... that came from an integer... that came from somewhere (a pointer). Since the compiler presumably has no idea where that "somewhere" was, it could've been q. It can't prove that it isn't q, because it knows that q's address was taken somewhere, and it has no idea where that address went. Hence it can't assume the target of the write can't have been q.
This assumes nothing about stacks, frames, or the ordering of any variables. You could order them any way you want and what I said would hold. It's literally just discussing how information could've flowed from one point to another.
It can't prove it isn't q even if q's address was never taken though. It does not matter that q's address was taken. p+1 would still point to q. If you have a pointer in scope, you can no longer remove or reorder any write at all. Those pointers could have been pointing to volatile locations for all the compiler knows. If there is multithreading, it cannot remove any write at all since the other thread could have had a pointer to that location.
It no longer matters that a variable did not have its address taken at any point since that address could have been taken via pointer arithmetic from somewhere else.
> It can't prove it isn't q even if q's address was never taken though.
I was imprecise with the wording.
What I meant is that, when q's address is taken and shoved into an integer, the compiler (assuming it gives up upon seeing integer conversion) fails to prove that the address points provably outside q on that path. Thus it needs to assume that the address might, in fact, provably point inside q (i.e. it might be guaranteed to overlap q). Thus it cannot assume q is unmodified.
If q's address was never taken, then the compiler could prove that q's address was unknown on that path, and thus could be anything. Because this proves there could exist an execution where the target doesn't overlap q, then the compiler could assume that an unmodified q is a valid execution. i.e. it could assume q is unmodified.
To put it in perhaps clearer terms: when the address of q is untaken, that means the program's (defined) behavior is completely invariant with respect to (i.e. symmetric w.r.t., i.e. independent of) q's address. Thus it can act as if the target of the write is not q (...because we just established the program's defined behavior is independent of q's address). Whereas it cannot make that assumption when q's address is leaked somewhere.
Hmmm... that sounds like exposed provenance (https://doc.rust-lang.org/stable/std/ptr/index.html#exposed-...) to me. When a pointer loses its provenance, its allocation is put in the global exposed list. Any pointer without a clear provenance then can alias any (all) of the exposed provenance allocations. What you are saying is that you want all pointers to behave this way. While that can prevent miscompiles, I would assume that that would also lead to massive slowdowns in runtime performance.
I hadn't heard of that before, cool. That sounds similar, yeah. (At a quick glance at least. I can't tell if it's exactly equivalent but it certainly seems close.)
> I would assume that that would also lead to massive slowdowns in runtime performance.
My responses are:
1. I am amused that the implication of this would be that Rust is slower than C/C++.
2. I would not assume such a thing purely on faith. It may be true, but there have been lots of poor specifications in the standards based on erroneous presumptions about the performance cost/benefit trade-offs that subsequently didn't hold. I won't even go into the ones that touch UB -- just look at the argument evaluation order that they finally caved on and made stricter from C++14 to C++17. I find the standards are overzealous w.r.t. optimization in a lot of cases, and it takes them a long time to admit that.
3. If that's actually true in this particular case, the correct solution to that would've been to provide an opt-in flag to break programs (think -ffast-math) rather than laying incorrectness as the foundation.
1. Rust has proper references with much stricter aliasing requirements. So no, it would be C that would slow down. Rust would still be able to take advantage of these optimisations.
2. In this case, I would say the effects are much more obvious. Just take the example in the article. The compiler cannot replace a pointer access with its value if there has been any other pointers access in between. The compiler also cannot reorder any statements with pointer accesses. They would basically behave as memory barriers (if I understand them correctly). That will cause slow downs in runtime (the compile time will speed up though).
3. Sure. Perhaps that is an option. I wouldn't call it a flag to "break program" though. Maybe a config file that tells the compiler which assumptions can be made (strict aliasing, associative floats, threading safe, etc).
EDIT: In point 3, I would also add assumptions like whether the code can be changed at runtime, etc (take a pointer to a function then write your own code there).
> In this case, I would say the effects are much more obvious. Just take the example in the article. The compiler cannot replace a pointer access with its value if there has been any other pointers access in between.
I have no idea what you mean. The whole point of this discussion is that the example in the article should not be optimized in the manner they're describing, because it would be wrong. That is... a good thing. Not a bad thing.
And the compiler certainly could replace a pointer access with its value if there has been other pointer access in between, as long as that pointer wasn't leaked away into some opaque place via an integer.
> The compiler also cannot reorder any statements with pointer accesses. They would basically behave as memory barriers (if I understand them correctly).
No, I don't think you understood what I'm saying. See above. It would still be perfectly legal for the compiler to transform
char p[1] = {0};
opaque();
return p[0];
to
opaque();
return 0;
under what I wrote above, the intervening statement notwithstanding.
How come? Assuming we're in a function, the `q` in question has just been allocated on the stack, and its address never left the context of the fragment, e.g. was never used as an argument of any function call; even the `print` receives the value of `q[0]`, not the address. Nothing else should know where `q` even is, let alone modify it behind the scenes.
Because for many conforming (not necessarily strictly conforming) programs, should and does are different things and many programs are written to be conforming rather than strictly conforming (hence why most non-trivial sane projects (including Clang in some cases, amusingly enough) builds themselves with flags that disable many of these style of checks).
The argument goes like this
Assuming that p lives at 0x0 and q lives at 0x1 (this is the only interesting case anyway, if this isn't the case the program clearly prints nothing)...
p[0] = {0}
q[0] = {0}
Line 1: ip = undefined, iq = undefined
Line 2: ip = 0x0 + 1 = 0x1 (assigned), iq = undefined
Line 3: ip = 0x1, iq = 0x1 (assigned)
Line 4: ip = 0x1, iq = 0x1 (ip == iq, so take branch, goto line 5)
Line 5: ip = 0x1, iq = 0x1
- p[1] = 0x10.
- Because p([0]) is at 0x0, p[1] must be at 0x1
- (address of p[1]) = 0x1 and (address of q[0]) = 0x1 and 0x1 == 0x1
- Therefore because these pointers have the same address, these two objects obviously must alias each other at this point
Line 6: print(q[0]); -- should print 10 because p[1] = 10 which means q[0] = 10 which because they're at the same location
The argument against this is that a pointer really is a address and some hidden data that tracks what allocation it was, therefore you can't treat p[1] as equaling q[0]. I find myself tending to agree with the original argument (this is C after all) rather than this counterargument, but I understand it nonetheless.
This touches a pain point on the evolution of C in general. C is a lot of the times the domain of stuff of it being a "portable assembler"[1] which many programmers who are writing C code for things like operating system kernels and other systems level code expect and like, but because a lot of these make non-strictly conforming programs, many popular compilers are writing optimizations in their evolution that attempt to target strictly conforming programs only at the expense of... well a lot of the reason people use (and have to use) C.
[1]: Many might take issue with the characterization of C, but it's how it was characterized even in things like the 1st ed of "The C Programming Language".
Note: the fact that this was actually done through q[0] rather than p[1] isn't really material for the argument, but this was an error I made in the original argumentation (that I can't go back and edit now) so just wanting to make that correction.
If anything, it makes the argument stronger, as the program as written is valid syntactically and semantically rather than relying on undefined behavior.
Ah, I see, it's trivial out-of-bounds array access, and the compiler's inability to notice that. The cause of innumerable data corruption bugs, crashes, and exploits.
I wish C did not have implicit pointer aliasing; it already has explicit aliasing via `union`. (And a different handling of nulls. And no UD. And a pony. Sadly, I don't have a time machine to travel back to 1969.)
So worth noting that the original code actually exhibits no undefined behavior as written (only q is accessed). I erred in my original argument by looking at the "first optimization" being applied.
I'd still argue that the third optimization would be incorrect here, if a program can determine that p+1 = q and replacing q with p+1 is legal, then it obviously knows that those two locations alias (kinda by definition of being able to replace that) and therefore can't apply the 3rd optimization.
I mean yes in this case the problem is you're doing a dereference out-of-bounds, which makes a program not strictly conforming.
The problem tends to be is that the real world is much more (essentially) complex and the domain of C is that which needs to be able to deal with that essential complexity, which includes things like type punning and int-to-ptr casts and the like which makes programs not strictly conforming. As it stands, a lot of these rules make it impossible to implement a lot of... C libraries... in ISO C, let alone things like operating system kernels and the like.
> And no UD
I assume this means undefined behavior? But in any case, undefined behavior is okay to have in a language, it's good for when there are no portable semantics for a particular operation or failure mode[1]. It's generally essential (although some undefined behaviors would definitely be better served being implementation defined or unspecified).
It is also okay for programmers to eschew this portability (so if they want to narrow their focus from "anything that can run strictly conforming C" to a superset that targets say a specific memory model or set of processors).
[1]: For example, at an extreme example, what happens if the CPU starts burning up? Obviously the standard can't describe such semantics for how a program should behave in the face of this.
I agree, type punning exists, this is why I mentioned `union`, invented to make some type punning easier. An explicit mechanism like that is essential. Doing it implicitly is, to my mind, too dangerous to be beneficial.
Same with undefined behavior. I'm fine with implementation-dependent behavior (like what size is `int`, etc). But modern compilers see UD and feel free to do anything, even removing the code in question. Such things should not be silent, or even a warning. Instead it should be something like `unsafe` sections in Rust: "I know what I'm doing, the usual safety assumptions are irrelevant here, do what I literally say". Doing weird things is sometimes required e.g. to handle memory-mapped hardware registers, etc. But such areas should be clearly delineated, for they are usually few. They also tend to not be portable.
A CPU burning up, hardware malfunctioning, etc is usually a situation where all bets are off. An unanticipated pointer aliasing in a trivial program running on perfectly working hardware is a different case.
> which many programmers who are writing C code for things like operating system kernels and other systems level code expect and like
Strong agree. Isn't this the whole point of C? I don't have any insight into why the standards committee could see it any other way. It seems like the whole situation got hijacked by optimization writers who had somehow lost touch with their users. But maybe there is a more coherent explanation of how this happend.
Do you suppose that most C programmers enjoy the fact that random aliasing and free out-of-bounds array access can lead to silent data corruption, as shown in the example?
This example isn’t showing silent data corruption due to out of bounds array access. It is showing that LLVM and others do not correctly apply optimizations to the original code. The program is correct and does what it’s supposed to in the original example.
You mean, accessing an array member past the declared array bounds and overwriting another variable that happens to be allocated right next to it is what the programmer should have consciously planned to do here?
IMHO, a planned aliasing would look like a union. The declaration we see does not even guarantee that q follows p in the address space, or maybe even that p and q are adjacent.
(I know that int[0] sort of means "no bounds", and I think it should be an error for a stack-allocated variable which ought to have definite bounds.)
With respect to "the example". The code in the linked article does not perform out of bounds array access. `ip` points to one-past-end, which is legal, and it is never dereferenced (in the original, correct, user written code). All issues described in the article relate to faulty compiler transformations.
On the other hand, if you are arguing that the language should protect the programmer from "silent data corruption" by limiting their freedom to implement correct low-level memory access or pointer manipulation (correct, but not easily provable by the compiler infrastructure) then I disagree. C is not a safe language and it should not protect the programmer. It is not acceptable to me that the compiler writers are overly conservative about which correct programs are expressible in C, and which can only be written in assembler. In C I am willing to take the risk of "silent data corruption" in exchange for the freedom to write the programs that I want to write. If for some project the risk/benefit equation weighs in a different direction I will choose a different language.
After that optimization pass, there is no store to q. C assumes that you can not store to a different object without doing out-of-bounds pointer arithmetic and that out-of-bounds pointer arithmetic is illegal, therefore it assumes that q is not stored during that store.
The problem is that the optimizer replaced a plain store through q with a illegal out-of-bounds pointer arithmetic store (because it was sneakily done in non-pointer land). That causes the "safety" checking (that can check less because of that invariant) to be insufficient. The problem is not leaking q, it is blindly replacing a legal construct with an "illegal" construct.
The specific problem here being the pointer cast which turned a integer into a pointer. You have no clue where that points to without analyzing the value of that integer. As such, the assumption should be: "can store anywhere, all loads everywhere in the program are suspect" when that occurs. You can then utilize pointer provenance to prove that "no, we do know where this can point to, the optimization/caching party is back on".
Leaking did not occur because they saved the address of q. "Leaking" occurred because they cast a integer to a pointer which can "leak" a pointer to literally anything.
> Leaking did not occur because they saved the address of q. "Leaking" occurred because they cast a integer to a pointer which can "leak" a pointer to literally anything.
I agree that you could choose semantics such that as soon as you take the address of a object that any store anywhere could alias it, so it is no longer legal to ever cache the value unless you prove the store does not overlap.
But, the key thing here is that if the programmer wrote:
*(p+1) = 10;
print(q[0]);
It is entirely reasonable and legal for the compiler to optimize that to:
*(p+1) = 10;
print(0);
Because the abstract model says that stores to one object can not alias another object. You, as the programmer, are required to not author out-of-bounds stores and a good compiler could detect illegal constructs.
However, in this case, the optimizer transformed a legal, well-formed construct into a illegal construct and then optimized as if it was always a illegal construct. Focusing on directly preventing such transforms and why such transforms should be illegal (pointing out the legal -> illegal thing they enable) minimizes the changes needed to the model.
> But, the key thing here is that if the programmer wrote *(p+1) = 10; print(q[0]); it is entirely reasonable and legal for the compiler to optimize that to *(p+1) = 10; print(0); Because the abstract model says that stores to one object can not alias another object.
If the programmer wrote that... but that's not what the programmer wrote. The programmer wrote *(char*)iq = 10 which was provably (!) equivalent to both *(char*)(uintptr_t)q = 10 as well as *(char*)(uintptr_t)(p+1) = 10. That means the compiler can't just replace it with the latter while ignoring the former -- it must carry the aliasing constraints implied by the former into the latter.
> However, in this case, the optimizer transformed a legal, well-formed construct into a illegal construct and then optimized as if it was always a illegal construct. Focusing on directly preventing such transforms and why such transforms should be illegal (pointing out the legal -> illegal thing they enable) minimizes the changes needed to the model.
I see where the confusion is.
You're missing this sentence: "I am using C syntax here just as a convenient way to write programs in LLVM IR." The intermediate transformations you see there are not C code.
You're looking at the compiler as "optimize C to faster C, then codegen". In which case you can blame step 2 (the replacement of *(char*)(uintptr_t)(p+1) with *(p+1)) as the violating culprit, due to the aliasing rules.
Which... would be fine if you view it that way, I don't mind too much if you do.
But I'm looking at the compiler as "generate IR for C, then optimize the IR, then codegen" (which is closer to what LLVM does), in which case the "*(p+1) = 10" you see in the 2nd optimization step isn't C code anymore, it's LLVM IR that the author tried to transcribe in some made-up language that superficially resembles C! You can't really say "this is a legal transformation" anymore because it's not the C code anymore, it's just some internal compiler gibberish that someone tried to display with a imitation-C syntax to make it easier to read. The aliasing rules no longer apply there, because it's not actually C... it's LLVM IR.
Which is why I said "q is never written to" (step 3) is the violation. Because as I saw it, step 2 was fine, because it wasn't C, so it didn't have the aliasing semantics of C. In fact it would be illegal for the compiler to transform the original program into that and continue to treat that as equivalent C.
> step 2 was fine, because it wasn't C, so it didn't have the aliasing semantics of C.
LLVM still has its own aliasing semantics. It must, both so that it can exploit at least some of C's aliasing rules, and more importantly so that we can determine whether a given transformation is correct, without reference to the entire corpus of optimization passes.
The article describes three possible choices of aliasing semantics for LLVM IR, each of which makes a different optimization pass illegal. You are arguing for one particular choice, without addressing its downsides as also covered in the article.
That's sort of the crux of the issue: what constitutes exposing a pointer?
Half the time, within the compiler, a pointer-to-integer conversion is essentially a nop cast instruction that can be optimized away if unused or freely deleted (e.g., roundtrip pointer-to-integer-to-pointer) or slide addressing information before and after the pointer. The other half the time, it's an important leaks-the-address operation that can't be triggered because, you know, that causes things to be leaked. It's Schrödinger's side effects, they both matter and don't matter, and you don't know if it ends up doing so or not until you open the box. It's a complete mess, and ultimately entirely incoherent, inconsistent semantics that is begging for a fix.
Even worse, if you read all the compiler documentation, you'll find strong assertions that provenance is carried solely by data dependence (including through integers). The compiler optimizations for integers explicitly disclaim preservation of data dependence, which is why the leading model for pointer provenance (PVNI) is "integers don't have provenance, therefore pointer-to-int is an address-exposing operation." But even more annoyingly, the semantics at the IR level imply that memory is untyped, which kind of means every load or store of a pointer is an implicit integer conversion, but that doesn't actually happen in practice, except that memory is frequently converted to an integer if you're just copying from one block of memory to another and it's just a mess of several optimizations are broken in weird, unexpected ways that no one really realized until people started trying to apply formal semantics to compiler optimizations and found these soundness holes.
And the most annoying part of all are the people sitting on the sidelines yelling at us "pointers are integers, why are you guys making this all so complicated?"
> That's sort of the crux of the issue: what constitutes exposing a pointer?
A lot of the people who subscribe to "optimization 3 is clearly wrong" make the argument that the answer is... everything is always exposed, whether it be (uintptr_t)0x42, some valid stack pointer, or anywhere else for that matter. This certainly tracks with the model that many C (and systems programmers, more generally) expect the language semantics to represent.
It disables a lot of optimizations to make these assumptions, but the argument generally is that many of those optimizations were likely extremely tenuous in the first place and have potentially dubious benefits outside of contrived benchmarks. This also was the point of the "restrict" and "register" keywords to tell the compiler "hey you can assume these things don't alias with each other" or "hey use registers for this variable", etc.
It's clear that a demand for this type of language semantics exists, as most large projects you can probably think of have flags that disable a lot of the alias assumptions (everything from Linux to Firefox to Clang (in some cases) to Chrome, etc) and some compilers (such as MSVC) don't bother with many of the alias assumptions in the first place (which affects anything built with those compilers).
When "making everything exposed" you can not even do register allocation. I hope you agree that this has some effect "outside of contrived benchmarks". Of course, you could require programmers to add "register" everywhere, but this ship has sailed decades ago.
Yeah, this does make the "read pointer from user input" case (which is actually sometimes useful) a bit weird however but it seems pretty obvious that ptr-to-int should probably be exposing at the very least
> Yeah, this does make the "read pointer from user input" case (which is actually sometimes useful) a bit weird
I think it works out quite reasonably and elegantly, actually.
If you can prove that an address wasn't leaked -- then you can assume the program behavior is independent of any pointer read from user input, and thus optimize as if the pointer wasn't read from user input.
Otherwise, you assume the address was leaked, and thus must act as if the target might overlap said address.
> however but it seems pretty obvious that ptr-to-int should probably be exposing at the very least
I don't think that's necessary, but it's certainly a valid way to write a compiler. (I say this because I don't think (void)(uintptr_t)ptr; should be considered exposing, for example.)
I don't really get if you're agreeing or disagreeing with me (or just discussing something else entirely), but to answer this bit:
> That's sort of the crux of the issue: what constitutes exposing a pointer?
Doesn't the answer ought to be that an exposure of a (source) pointer P1 to some (target) pointer P2 ought to be "any potential dependence of P2 on P1"?
If you can prove the two are independent, then you can assume P1 is not exposed to P2. That is the case when, for example, the address P1 is never used anywhere. It is also the case when P1 is converted to an integer, but then the integer is discarded immediately. It's up to you how deeply you want to analyze the dataflow -- and it's perfectly fine if you want to assume integers lack provenance -- but whatever you do needs to needs to be consistent with the above.
All of which is to say, I don't see how "what constitutes exposing a pointer" would be a hard question.
> All of which is to say, I don't see how "what constitutes exposing a pointer" would be a hard question.
It's a hard question because any coherent semantics you come up with breaks optimizations that are obviously correct™. This is difficult to convey in toy examples, because in toy examples, there's clearly enough information present that the optimizer could do the right thing™ if it were smart enough.
The other thing to keep in mind is that optimization generally works on semi-mangled forms of the source code, so if you have a pointer-to-integer that has no use at the IR level, that doesn't necessarily mean that the pointer-to-integer had no use at the source code level (and same for the converse, incidentally--the IR might materialize uses that didn't exist at the source code level).
> It's a hard question because any coherent semantics you come up with breaks optimizations that are obviously correct™. This is difficult to convey in toy examples, because in toy examples, there's clearly enough information present that the optimizer could do the right thing™ if it were smart enough.
I don't buy this (particularly your first sentence). I need to see it to believe it.™ I'm not saying it's not the case, just that it's kind of a tough sell when the illustration fails basic scrutiny.
But if it's actually true, then that's a reason to stop using toy examples in explanations, not a justification for plowing ahead with obviously wrong toy arguments.
Imagine if laws worked this way? "As you can see here, you actually ran a red light." "But the light is literally green?!" "It's difficult to show in this picture, but it is red. Trust me bro, we wouldn't catch obvious criminals if we only addressed the cases where it was obviously red." Uhm, what?
A good mental exercise with these kinds of examples is to imagine splitting up the parts of the program that each optimization looked at, so that they live in different translation units.
This prevents your mind from playing games with what exactly the optimization can know. The article tries to do something similar by fully writing out the output of each optimization as a separate program.
> A good mental exercise with these kinds of examples is to imagine splitting up the parts of the program that each optimization looked at, so that they live in different translation units. This prevents your mind from playing games with what exactly the optimization can know. The article tries to do something similar but fully writing out the output of each optimization as a separate program.
I already do that in my mind, and my mind isn't playing games. If you have a specific point to rebut please do so. Though please read https://news.ycombinator.com/item?id=42907292 in its entirety beforehand. It seems we are reading those intermediate outputs differently.
My point was that toy examples like this are useful as long as you think about them consistently. I wasn't trying to make any claims about your concrete argument in particular.
> My point was that toy examples like this are useful as long as you think about them consistently. I wasn't trying to make any claims about your concrete argument in particular.
Actually I'd suggest the law analogy illustrates the problem with your approach.
"You agree on the principle that I am Innocent until proved Guilty right?" Sure. "And I haven't been proved guilty, have I, so therefore I'm innocent right?" And that's why we need a trial, "But it makes no sense to try the innocent".
> Actually I'd suggest the law analogy illustrates the problem with your approach. "You agree on the principle that I am Innocent until proved Guilty right?" Sure. "And I haven't been proved guilty, have I, so therefore I'm innocent right?" And that's why we need a trial, "But it makes no sense to try the innocent".
Your statement makes no sense. The semantic analysis is the trial. The optimization is the sentencing. The compiler is making a false assumption during trial and then proceeding with sentencing as if the code was guilty when the evidence clearly pointed otherwise.
(I'm not about to digress into a random quibble about an analogy so this is my last comment on that.)
In the input to the final optimization, there is only one store between the place where q is allocated and the place where q is used.
That one store is clearly identified as going into p, not into q.
Therefore, it's quite reasonable to conclude that q has not been changed.
If we weren't allowed to conclude that, we would almost never be able to put stack variables into registers, and that would make even simple programs run a lot slower.
(The other point, which is another argument that it's the second optimization that should be considered incorrect, is that the program before the final optimization has a store to p[1], which is an out-of-bounds access into an array, which should be considered an error in all reasonable programming languages.)
>In the input to the final optimization, there is only one store between the place where q is allocated and the place where q is used. That one store is clearly identified as going into p, not into q.
> Therefore, it's quite reasonable to conclude that q has not been changed.
No, that is in fact unreasonable and does not follow at all.
You are completely ignoring any relationships the program might have established between the two at the point at which *(p+1) is written to. [1]
Namely, you are ignoring that the program has already established iq == ip at that point. Which means the program already established (uintptr_t)q == (uintptr_t)(p+1). Which means the program already established q == p + 1. (NOTE: I am not writing C code here. See [1].) Which means *q == *(p + 1). Thus in the last optimization you must assume *q may have been modified. Of course you (the compiler) are welcome to give up on analyzing that chain of logic at any point at which it seems too difficult, but if you break that chain of logic, you cannot make violating assumptions in a subsequent step.
You are making the assumption that all optimizations see everything, but that's not the case. The third optimization never got a chance to observe that relationship. This means that either:
- the third optimization should assume that _any_ relationship between two expressions could have been established before it runs (even if that relationship was never established!)
- one of the first two optimizations is also wrong because it did not give subsequent optimizations the chance to observe that relationship.
> one of the first two optimizations is also wrong because it did not give subsequent optimizations the chance to observe that relationship.
If you view them as actual equivalent C code, sure. In which case, be my guest at saying those are wrong. But I think you're conflating actual C with the imitation-C used to represent the LLVM IR in the article. I responded to this here: https://news.ycombinator.com/item?id=42907292
You mention that the compiler should track the aliasing informations, as to enrich this C imitation language, but this seems exactly what the article means with "provenance".
Important to note that in the years since this post was written, provenance has become an explicitly accepted consideration of the Rust memory model ( https://doc.rust-lang.org/std/ptr/index.html#provenance ), and standard APIs have been stabilized for working with pointers in ways that preserve provenance: https://doc.rust-lang.org/std/ptr/index.html#strict-provenan...