Hacker News new | comments | show | ask | jobs | submit login

This reminds me of a recent conversation I had with someone about Nim: I suggested it to them and they rejected it based on the premise that compiling to C is "bad" because of undefined behaviour. But now I find out that even LLVM IR has undefined behaviour...



AFAIK Gödel's incompleteness theorems imply that _any_ language will have at least some undefined behaviour.


No, you're confusing "undefined" with "unprovable". There are several "safe" languages where every well-formed program has well-defined semantics (for example, most interpreted languages).

Undefined behavior means that there are some well-formed programs that have semantic holes: the specification says that is up to you to avoid that your program gets to some illegal state. If your program does, the language says "I told you not to get there. Now I can do whatever I want". The reason is that by assuming that the program cannot reach those states, better code can be generated (think bounds checking).

What Godel's theorem says (and the more CS-specific version of it is Rice's theorem [1]) is that you cannot have an algorithm that proves any given property about the semantics of a program. But if the language is safe, the property of being well-defined is trivial (all the programs have it), and trivial properties are the only exceptions to the theorem.

[1] https://en.wikipedia.org/wiki/Rice's_theorem


Correct. Rice's theorem limits the ability to check a program, e.g. with a compiler.


I think you're stretching Gödel too far. Perhaps most "fundamentally", it speaks to a limitation on finite first-order logic systems, and "computer languages" generally don't impose a finiteness constraint (a language which does would not be Turing Complete).

Though if I'm missing the point you're making, it would probably be quite interesting to see it in more depth.


Hardly. It'll imply that any language could get stuck or fail to terminate but UB is something different.


That would be Turing's Halting problem, but yeah, undecidables permeate both concepts.


Undecidable and undefined are completely different concepts. Note Turing's original proof operates on "Turing Machines"; they are completely, mathematically-rigorously defined, but have some questions you can ask about them that are undecidable. Arguably part of the point of the proof is precisely to show that definedness does not imply decideability, a thing which we now take for granted but was still something mathematics as a human endeavor was working through at the time. (If I recall my timelines correctly, this was not the first such demonstration, but the old habits had not yet died and the understanding not yet fully percolated through.)

Nothing stops you from creating a computer language that has absolutely no undefined behavior except the general difficulty of fully defining behavior and the practical issues involved with possibly overspecifying your behavior to the point it can't be practically implemented on real hardware. You also face the possibility that your model of the real hardware may also be flawed, if for no other reason than because of CPU bugs, so in practice you may not be able to manifest your fully-specified language 100.00000%. But it's not in principle impossible.


I completely agree with the gist of your comment, but you're confusing "undefined", in the sense meant in the article, with "unspecified".

"Undefined" refers to certain properties of certain languages, like C, where an operation can produce any of an unenumerated set of behaviors, including behaviors that go well beyond normal program behavior. For example, in C, writing to a pointer that's pointing outside of allocated application memory can do things like change the current function's return address; in C++ it may change an object's vtable. The set of possible behaviors that can ensue is too large to enumerate and depends heavily on unforeseen factors.

What you're referring to is unspecified behavior. For example, in C, calling a function like so: foo(i++, i++) does not specify the values of the parameters to the function, but they can be one of a small set of possibilities, and cannot cause undefined behavior (like changing the function's return address etc.).

Creating a language with no undefined behavior is quite easy and is the norm in many high-level languages. Fully specifying the behavior of the language is harder, and you're right that often there is intentionally unspecified -- i.e., implementation dependent -- behavior to allow for efficient implementations on different platforms by the language's compiler/runtime, OS or hardware. For example, in Java, changing the values of non-volatile fields concurrently by multiple threads has an unspecified result, and this is intentional.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: