
Undefined Behavior != Unsafe Programming - steveklabnik
http://blog.regehr.org/archives/1467
======
Terr_
> Undefined behavior is the result of a design decision: the refusal to
> systematically trap program errors at one particular level of a system. The
> responsibility for avoiding these errors is delegated to a higher level of
> abstraction.

Sounds a bit like an argument for Checked Exceptions: If you're going to write
code for one layer of abstraction, your code should only emit errors that
match its own tier. (And not raw naked primeval stuff occurring many layers
down.) ( [http://imgur.com/iYE5nLA](http://imgur.com/iYE5nLA) )

~~~
ktRolster
It's also not entirely true.....undefined behavior also exists because it's a
serious pain to define every detail of a language. The only language I know of
that is completely defined (other than toy languages) is ML......and the
authors felt the work to completely define it wasn't entirely worth the
effort.

~~~
pron
Java is also completely defined and pretty completely specified. But don't
confuse unspecified behavior with undefined behavior. Unspecified behavior
means that one of several things can happen, but it's unspecified which.
Undefined behavior means that absolutely anything can happen, including things
that are not part of normal application behavior.

~~~
Terr_
> Unspecified behavior

I feel like "unspecified" still sounds too lax. How about "implementation-
defined"?

I'm thinking of scenarios like: "When this occurs, The Java Virtual Machine
implementation may choose to either X or Y. If it does Y, then it shall throw
a Z exception."

------
dom96
This reminds me of a recent conversation I had with someone about Nim: I
suggested it to them and they rejected it based on the premise that compiling
to C is "bad" because of undefined behaviour. But now I find out that even
LLVM IR has undefined behaviour...

~~~
l0b0
AFAIK Gödel's incompleteness theorems imply that _any_ language will have at
least some undefined behaviour.

~~~
ot
No, you're confusing "undefined" with "unprovable". There are several "safe"
languages where every well-formed program has well-defined semantics (for
example, most interpreted languages).

Undefined behavior means that there are some well-formed programs that have
semantic holes: the specification says that is up to you to avoid that your
program gets to some illegal state. If your program does, the language says "I
told you not to get there. Now I can do whatever I want". The reason is that
by assuming that the program cannot reach those states, better code can be
generated (think bounds checking).

What Godel's theorem says (and the more CS-specific version of it is Rice's
theorem [1]) is that you cannot have an algorithm that proves any given
property about the semantics of a program. But if the language is safe, the
property of being well-defined is trivial (all the programs have it), and
trivial properties are the only exceptions to the theorem.

[1]
[https://en.wikipedia.org/wiki/Rice's_theorem](https://en.wikipedia.org/wiki/Rice's_theorem)

~~~
paulddraper
Correct. Rice's theorem limits the ability to check a program, e.g. with a
compiler.

------
tel
The real heart here is that program analyses have opened doors to
optimizations that cannot be represented at the level of the language (LLVM
IR). This is super convenient for compiler authors who want a flexible IR for
optimization purposes. It's the same thing that happens with C UB too:
optimizations will take advantage of what the compiler knows, not what the
programmer believes.

------
lmm
> The essence of undefined behavior is the freedom to avoid a forced coupling
> between error checks and unsafe operations.

Maybe they should remain coupled - after all, they're intimately related, the
error check is what makes the "unsafe" operation reasonable. For a program to
remain correct it is vital that the error check remains adequately coupled to
the undefined behaviour it's preventing - e.g. if an operation that would do
something weird on overflow is being used, the link to our reason for
believing that overflow can't happen in this case should be made explicit. It
should be possible to do this in a way that has zero overhead in the final
machine code (e.g. a richer type system at the LLVM bytecode level).

------
kzrdude
> Either way, UB at the LLVM level is not a problem for, and cannot be
> detected by, code in the safe subsets of Swift and Rust.

But this is not true, there are things that leak through and cause big
headaches for the designers of Rust.

~~~
ithkuil
designers of Rust or implementors of the rust compiler?

(honest question)

~~~
wyldfire
I think the two sets (designers of the Rust language) and (rustc implementors)
intersect really well.

I think that the distinction should matter more if/when someone tries another
implementation of the language.

------
tomp
> Undefined behavior is the result of a design decision: the refusal to
> systematically trap program errors at one particular level of a system.

Huh? This is quite a bit of a false dichotomy... Illegal operations could also
result in _unspecified_ behaviour (e.g. it's not specified what result an
integer overflow gives, but the rest of the program must continue normally).

> unsafety of machine code

In what way is the machine code unsafe? AFAIK the CPU will always try to
execute the code, the worst that can happen is some kind of a trap.

~~~
pasta
I'm not sure I understand you. What is the difference between undefined and
unspecified behavior?

If it was decided that an overflow would generate an error, than it was a
design decision to trap errors at that level. The program could crash so that
everybody knows something is wrong.

If it was decided that an overflow just would overflow, than it sounds like a
refusal to trap the error. The program could continue in an unexpected state.

Maybe it's better to crash the program so you know something is wrong.

~~~
tom_mellior
> I'm not sure I understand you. What is the difference between undefined and
> unspecified behavior?

They explained in the very next sentence: "e.g. it's not specified what result
an integer overflow gives, but the rest of the program must continue
normally".

(In the C standard, "unspecified" is an explicit marker for things where a
compiler _must_ document its choice of semantics.)

------
johncolanduoni
> For example, it is obvious that a safe programming language can be compiled
> to machine code, and it is also obvious that the unsafety of machine code in
> no way compromises the high-level guarantees made by the language
> implementation.

I agree with the thrust of the statement, but this isn't nearly as simple as
is stated. Compiler bugs exist, and the more standards your code has to
interact with on the way to machine code (e.g. C spec => LLVM spec => X86
spec) the dicier this becomes, not to mention the assumptions about the APIs
exposed to you by your OS and libraries you use.

------
catnaroek
Duly upvoted, but I'd like to note that the problem with so-called “unsafe
languages” isn't undefined behavior per se, but rather the lack of appropriate
tools (e.g. a formal axiomatic semantics) to reliably write programs with no
undefined behavior.

~~~
solidsnack9000
As long as you use the new semantics, the unsafe parts of the language are
barred to you, so it's like a different language.

~~~
catnaroek
I'm not talking about changing languages, but rather about giving existing
languages more rigorous specifications. My beef with C, C++, unsafe Rust, etc.
isn't the presence of undefined behavior in arbitrary programs (for which the
solution is very simple: don't write arbitrary programs!), but rather the
absence of a _calculational method_ [0] for designing programs that don't have
undefined behavior (for which a formal semantics is an indispensable
prerequisite). To convince yourself that a C program doesn't have undefined
behavior, you need to interpret the _English text_ of the C standard, and
that's what I object to.

[0] The calculations needn't be performed by a machine. If they are performed
by a machine, you have a safe language. If they must be performed by humans,
you have an unsafe language, which is nevertheless a lot more usable than C
and friends.

~~~
beached_whale
One can use clang and gcc to either produce a runtime error if undefined
behaviour is detected or that and abort the program.

There are several things, like signed overflow of integrals, that cannot often
be determined at compile time.

~~~
johncolanduoni
I think the OP is referring to full-blown verification systems, like some
dependently typed languages have (although they aren't the only ones). If your
program's behavior is defined because you wrote it that way (i.e. not by
chance), you would have an argument for why the undefined behavior is not
invoked that could be translated into a formal proof in most sophisticated
(i.e. ZFC-ish) formal systems.

~~~
catnaroek
Formal verification, yes. Mechanical verification (dependent types, SAT/SMT
solvers, model checkers, etc.), not necessarily. I don't mean to insult the
efforts of the mechanical verification community, but it's preposterous to
suggest that the only way to verify programs is to use a computer. Hey, we
have brains! We can use them too!

------
gus_massa
HN title filter ate the "!". The title must say "!=" instead of "=".

~~~
steveklabnik
Gah! Thanks. I guess I should use the words in the future. I'm used to
copy/paste-ing titles, given the HN guidelines.

~~~
greglindahl
... you know you can edit your article title at any time ...

~~~
steveklabnik
I don't have the option to right now. So not literally any time.

~~~
pygy_
[https://news.ycombinator.com/edit?id=13648333](https://news.ycombinator.com/edit?id=13648333)
maybe? I think that's how the mods access it IIRC.

~~~
steveklabnik
Nope, doesn't have a way of editing.

