Hacker News new | past | comments | ask | show | jobs | submit login

This is fundamentally the same thing as undefined behavior, regardless of whether Odin insists on calling it by a different name. If you don't want behavior to be undefined, you have to define it, and every part of the compiler has to respect that definition. If a use-after-free is not undefined behavior in Odin, what behavior is it defined to have?

As a basic example, if the compiler guarantees that the write will result in a deterministic segmentation fault, then that address must never be reused by future allocations (including stack allocations!), and the compiler is not allowed to perform basic optimizations like dead store elimination and register promotion for accesses to that address, because those can prevent the segfault from occurring.

If the compiler guarantees that the write will result in either a segfault or a valid write to that memory location, depending on the current state of the allocator, what guarantees does the compiler make about those writes? If some other piece of code is also performing reads and writes at that location, is the write guaranteed to be visible to that code? This essentially rules out dead store elimination, register promotion, constant folding, etc. for both pieces of code, because those optimizations can prevent one piece of code from observing the other's writes. Worse, what if the two pieces of code are on different threads? And so on.

If the compiler doesn't guarantee a deterministic crash, and it doesn't guarantee whether or not the write is visible to other code using the same region of memory, and it doesn't provide any ordering or atomicity guarantees for the write if it does end up being visible to other code, and then it performs a bunch of optimizations that can affect all of those things in surprising ways: congratulations, your language has undefined behavior. You can insist on calling it something else, but you haven't changed the fundamental situation.




You language has behavior not defined within the language, sure. What it does not now have is permission for the compiler to presume that the code never executes with input that would cause the behavior not defined to occur.


The compiler is already doing that when it performs any of the optimizations I mentioned above. When the compiler takes a stack-allocated variable (whose address is never directly taken) and promotes it to a register, removes dead stores to it, or constant-folds it out of existence, it does so under the assumption that the program is not performing aliasing loads and stores to that location on the stack. In other words, it is leaving the behavior of a program that performs such loads and stores undefined, and in doing so it is directly enabling some of the most basic, pervasive optimizations that we expect a compiler to perform.

In a language with raw pointers, essentially all optimizations rely on this type of assumption. Forbidding the compiler from making the assumption that undefined behavior will not occur essentially amounts to forbidding the compiler from optimizing at all. If that is indeed what you want, then what you want is something closer to a macro assembler than a high-level language with an optimizing compiler like C. It's a valid thing to want, but you can't have your cake and eat it too.


When you put it like that, it's actually interesting. If they went ahead and said, "This is a language which by design can't have an optimizing compiler, it's strictly up to the programmer - or the code generator, if used as an intermediate language - to optimize" then it would at least be novel.

But as they don't, I see it more as an attempt to annoy the people who have studied these sort of things (I guess you are the people who "suck the joy out of programming" in their eyes)


No, the compiler is not "already" doing that. Odin uses the llvm as a backend (for now) and it turns off some of those UB-driven optimimzations (as mentioned in the OP).

Some things are defined by the language, some things are defined by the operating system, some by the hardware.

It would be silly for Odin to say "you can't access a freed pointer" because it would have to presume to know ahead of time how you utilize memory. It does not. In Odin, you are free to create an allocator where the `free` call is a no-op, or it just logs the information somewhere without actually reclaiming the 'freed' memory.

I can't speak for gingerBill but I think one of the reasons to create the language is to break free from the bullying of spec laywers who get in the way of systems programming and suck all the joy out of it.

> it does so under the assumption that the program is not performing aliasing loads and stores to that location on the stack

If you write code that tries to get a pointer to the first variable in the stack, and guess the stack size and read everything in it, Odin does not prevent that, it also (AFAIK) does not prevent the compiler from promoting local variables to registers.

Again, go back to the twitter thread. An explicit example is mentioned:

https://twitter.com/TheGingerBill/status/1496154788194668546

If you reference a variable, the langauge spec guarantees that it wil have an address that you can take, so there's that. But if you use that address to try to get other stack variables indirectly, then the language does not define what happens in a strict sense, but it's not 'undefined' behavior. It's a memory access to a specific address. The behavior depends on how the OS and the Hardware handle that.

The compiler does not get to look at that and say "well this looks like undefined behavior, let me get rid of this line!".


> If you write code that tries to get a pointer to the first variable in the stack, and guess the stack size and read everything in it, Odin does not prevent that, it also (AFAIK) does not prevent the compiler from promoting local variables to registers.

This is exactly what I described above. Odin does not define the behavior of a program which indirectly pokes at stack memory, and it is thus able to perform optimizations which exploit the fact that that behavior is left undefined.

> The compiler does not get to look at that and say "well this looks like undefined behavior, let me get rid of this line!".

This is a misleading caricature of the relationship between optimizations and undefined behavior. C compilers do not hunt for possible occurrences of undefined behavior so they can gleefully get rid of lines of code. They perform optimizing transformations which are guaranteed to preserve the behavior of valid programs. Some programs are considered invalid (those which execute invalid operations like out-of-bounds array accesses at runtime), and those same optimizing transformations are simply not required to preserve the behavior of such programs. Odin does not work fundamentally differently in this regard.

If you want to get rid of a particular source of undefined behavior entirely, you either have to catch and reject all programs which contain that behavior at compile time, or you have to actually define the behavior (possibly at some runtime cost) so that compiler optimizations can preserve it. The way Odin defines the results of integer overflow and bit shifts larger than the width of the operand is a good example of the latter.

C does have a particularly broad and programmer-hostile set of UB-producing operations, and I applaud Odin both for entirely removing particular sources of UB (integer overflow, bit shifts) and for making it easier to avoid it in general (bounds-checked slices, an optional type). These are absolutely good things. However, I consider it misleading and false to claim that Odin has no UB whatsoever; you can insist on calling it something else, but that doesn't change the practical implications.


> They perform optimizing transformations which are guaranteed to preserve the behavior of valid programs. Some programs are considered invalid (those which execute invalid operations like out-of-bounds array accesses at runtime), and those same optimizing transformations are simply not required to preserve the behavior of such programs.

I think this is the core of the problem and it's why people don't like these optimizations and turn them off.

Again I'm not the odin designer nor a core maintainer, so I can't speak on behalf of the language, but from what I understand, Odin's stance is that the compiler may not make assumptions about what kind of code is invalid and whose behavior therefore need not be preserved by the transformations it makes.


> C compilers do not hunt for possible occurrences of undefined behavior so they can gleefully get rid of lines of code.

Yes they do, if they detect UB they consider the result poisoned and delete any code that depends on it.


> The compiler does not get to look at that and say "well this looks like undefined behavior, let me get rid of this line!".

No production compiler does that (directly). This is silly. We want to help programmers. They sometimes keep it even if it is known to be UB just because removing it is unlikely to help optimizations.

But if you are optimizing assuming something does not happen, then you have undefined behavior. And you are always assuming something does not happen when optimizing.


> The compiler is already doing that when it performs any of the optimizations I mentioned above. When the compiler takes a stack-allocated variable (whose address is never directly taken) and promotes it to a register, removes dead stores to it, or constant-folds it out of existence, it does so under the assumption that the program is not performing aliasing loads and stores to that location on the stack. In other words, it is leaving the behavior of a program that performs such loads and stores undefined, and in doing so it is directly enabling some of the most basic, pervasive optimizations that we expect a compiler to perform.

No, that's C-think. Yes, when you take a stack-allocated variable and do those transformations, you must assume away the possibility that it's there are aliasing accesses to its location on the stack. Thus, those are not safe optimizations for the compiler to perform on a stack-allocated variable.

It's not something you have to do. The model of treating each variable as stack-allocated until proven (potentially fallaciously) otherwise is distinctly C brain damage.

> If that is indeed what you want, then what you want is something closer to a macro assembler than a high-level language with an optimizing compiler like C. It's a valid thing to want, but you can't have your cake and eat it too.

This is a false dichotomy advanced to discredit compilers outside the nothing-must-be-faster-than-C paradigm, and frankly a pretty absurd claim. There are plenty of "high-level" but transparent language constructs that can be implemented without substantially assuming non-aliasing. It's totally possible to lexically isolate raw pointer accesses and optimize around them. There is a history of computing before C! Heck, there are C compilers with "optimization" sets that don't behave as pathologically awfully as mainstream modern compilers do when you turn the "optimizations" off; you have to set a pretty odd bar for "optimizing compiler" to make that look closer to a macro assembler.

It's okay if your compiler can't generate numerical code faster than Fortran. That's not supposed to be the minimum bar for an "optimizing" compiler.


We are talking about Odin, a language aiming to be 'better C' the way Zig is. The literal only reason anyone uses C is to write code that runs as fast as possible, whether for resource-constrained environments or CPU-bound hot-paths. Odin has many features that one would consider warts if you weren't in an environment where you'd otherwise turn to C, such as manual memory freeing. If I were pre-committing to a language that runs five times slower than C, I have no reason to select Odin over C#, a language that runs only ~2.4 times slower than C.


> The model of treating each variable as stack-allocated until proven (potentially fallaciously) otherwise is distinctly C brain damage.

OK, let's consider block-local variables to have indeterminate storage location unless their address is taken. It doesn't substantively change the situation. Sometimes the compiler will store that variable in a register, sometimes it won't store it anywhere at all (if it gets constant-folded away), and sometimes it will store it on the stack. In the last case, it will generate and optimize code under the assumption that no aliasing loads or stores are being performed at that location on the stack, so we're back where we started.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: