
Undefined vs. Unsafe in Rust - ingve
https://manishearth.github.io/blog/2017/12/24/undefined-vs-unsafe-in-rust/
======
solidsnack9000
Unsafe code might result in undefined behaviour. Code in an `unsafe` block,
however, is code that is not compiler checked to be safe, but hopefully was
hand checked to be safe.

The `unsafeXYZ` idiom is very common in the functional programming world. For
individual functions, prefixing them with `unsafe` generally means, this
function could result in undefined behaviour. However, there is the other idea
that we mark something as `unsafe` to ask the compiler not to check it and
just to trust us (`unsafePerformIO` is often used this way). Maybe having a
second term would make this clearer.

~~~
atomashpolskiy
It is in fact checked by both the compiler and the borrow checker. The only
difference between safe and unsafe code is that the latter allows dereference
of pointers and direct manipulation of memory, which may result in segfaults,
memory leaks and data races.

~~~
solidsnack9000
Maybe there is some checking but there is clearly somewhat less. This is in
keeping with the general idiom (shared with Haskell and Swift, among others).

~~~
steveklabnik
To be clear: unsafe Rust is a _superset_ Of safe Rust. Every single safety
check is still turned on in an unsafe block.

However, that superset contains things that are not checked. This has no
effect on any other code though; safe code with a superfluous unsafe block
acts 100% exactly the same.

------
ridiculous_fish
One nasty class of errors in C and C++ is due to environment mismatches in
separate compilation. For example, a struct may conditionally add a field
according to some compilation flag, say, whether ASAN is enabled. If
compilation units disagree on that flag, they can disagree on the size of the
struct, leading to UB.

Does Rust have a way to prevent this class of errors? What ensures that all
units have the same view of a type?

~~~
Rusky
Multiple units don't have their own copies of type definitions to begin with.
The compiler sees them all at once instead, to the point that different
versions of a crate produce different incompatible types.

At a lower level, symbols are mangled to include a crate identification hash.
This is more of a way to allow multiple versions to coexist though, as the
problem is already solved by that point.

~~~
ridiculous_fish
Thanks for the reply. What do you mean by "different versions of a crate?"
Here's a scenario:

1\. Compile a library libA that references type T in CrateX.

2\. Add a field to T, and recompile CrateX.

3\. Compile an executable a.out that passes a T to libA.

Nobody bumped CrateX's version or recompiled libA. Won't this crash? Or does
this produce a "different version" of CrateX and so it will refuse to link?

~~~
pcwalton
It produces a crate with incompatible symbol names (a "different version").
There's a "strict version hash" derived by hashing a representation of the
structure of a crate's entire external interface. That hash value is appended
to every symbol name as part of the name mangling process.

~~~
comex
Is that actually true anymore? I just tried compiling a simple crate:

    
    
        pub struct S {
            a: usize,
        }
        pub fn foo(s: &S) -> usize {
            s.a
        }
    

…as a dylib, then added another field to S and recompiled. I expected it to
change the symbol names (as shown by nm), but it didn't. The metadata might be
different (I don't know how to display it), but the dynamic linker doesn't
know about that.

~~~
eddyb
That changed with incremental recompilation, to allow reuse across changes.

However, Cargo will still ensure different versions of the same crate have
different symbols, by passing its own hashes to rustc via -C metadata.

~~~
ridiculous_fish
Ah here we go again! What does "different version" mean? Is it explicit
version metadata or something computed from the interface?

~~~
Manishearth
Computed from the interface. The metadata does not understand semver or any of
the higher level versioning tools Cargo exposes to users.

The metadata just contains info on all the types, and their hashes (or
something like that), so if stuff doesn't match you'll know.

This is generally visible from the "expected type Foo but found type Foo"
error, which will often mention you have two versions of the same crate.

Worth mentioning that unlike C++ or C Rust doesn't have a global name mangling
scheme; so it is totally ok for two crates to have a toplevel struct Foo
(unlike in C++ where you are forced to namespace them with uniquely-named
namespaces). This has the side effect of it being totally ok to link two
versions of the same crate together; and Rust will just complain if you try to
mix the types.

~~~
comex
Are you sure the metadata hash that _Cargo_ computes (and passes with -C
metadata) is based on the AST? I don’t think Cargo tries to parse source
files…

~~~
Manishearth
No, that's a metadata hash it creates (probably from semver info, but also
perhaps file contents) so that it can ask rustc to mangle symbol names.

~~~
eddyb
No file contents, that would defeat incremental recompilation.

------
benmmurphy
it kind of sucks that rust doesn't provide an easier way for Julia to solve
her problem and she has to resort to unsafe. it sounds like she just wants to
cast some bytes to a structure. if rust had c-style serialization from
bytes->structure then this could be done without any unsafe [assuming all the
c types were rust 'safe' which they should be]. but i guess this is a pretty
rare use case.

i guess the reason behind using the c-structs is to avoid manually calculating
the offsets for N different ruby versions you want to support. i guess the
alternative to casting/serialization would be to extract the offsets from the
generated rust bindings which is apparently unsafe in rust as well. heh

~~~
steveklabnik
Casting bytes to a structure is tricky! First of all, unlike C, Rust doesn't
define struct layout. We change the internal layout every so often to add
optimizations, and we can only do this thanks to that guarantee. We do have a
way to say "use C's layout", if it's a struct you're defining. Also, you have
things like endianness to worry about, portability is tough here!

Depending on your sitaution, you don't have to use unsafe yourself; for
example, the byteorder crate can help here, or any of the serialization
libraries that can serialize to a binary format.

------
empath75
Tl;dr;

Code in an unsafe block _may_ generate undefined behavior if it is used
incorrectly, but, if used correctly, will not.

Rust code which is not in an unsafe block cannot generate undefined behavior
(aside from bugs in the compiler/underlying libraries).

~~~
TheDong
Your tl;dr is wrong.

> Code in an unsafe block may generate undefined behavior if it is used
> incorrectly, but, if used correctly, will not.

The point of this was that code in an unsafe block should not be able to
generate undefined behavior no matter how it is used from safe code, otherwise
that unsafe block is unsafe, not safe.

> Rust code which is not in an unsafe block cannot generate undefined behavior

That's false, code outside of an unsafe block can generate undefined behavior
by calling unsafe code that was not written safely.

> tl;dr

Please don't post misleading summaries on nuanced topics.. especially when the
original post is short.

~~~
Jhsto
> That's false, code outside of an unsafe block can generate undefined
> behavior by calling unsafe code that was not written safely.

Is there any way to write safe code in Rust then?

~~~
oconnor663
This tldr-counter-tldr discussion is circling around this subtle issue: Unsafe
code has a "contaminating" effect on the module it's in. For example, consider
the length of a Vec. That's just an integer. Any method on Vec could change
that integer without needing an unsafe block. However, increasing the length
incorrectly (for example, past the allocated capacity) will totally cause UB,
because unsafe code in _other methods of Vec_ assumes the length is correct.
Vec is able to expose a safe API, because the length is a private member, and
callers can't set it willy-nilly. But code inside the Vec module needs to be
very careful, even when it's not explicitly using unsafe blocks.

~~~
protomikron
Is it possibly to model that efficiently with a total programming language -
or do we need a turing-complete language to write a feature-rich vector (or
array, matrix, `Vector a N`, etc.) library?

I sometimes wonder if we should maybe invest more in non-turing-complete
language research. Obviously Haskell (and Rust) are improvements to the
status-quo (critical code can be guarded explicitely via _unsafe_ attributes
and written by experts), but maybe this does not go far enough.

~~~
oconnor663
I'm not sure if this is directly related to what you're thinking about or not,
but I think some of Rust's stdlib has been formally verified. It might be that
more work in formal verification can get us the guarantees we want, without
needing to change the language itself.

------
aisofteng
This all seems rather obvious.

~~~
kbenson
It all depends on the audience. To someone that's been following rust, it's
rather obvious (even if the distinction in meanings of unsafe may be a bit
clearer), but for someone that only vaguely knows that it "allows you write
safe code" it may have been very useful in quickly getting to the crux of the
matter.

------
cwzwarich
Rust actually has undefined behavior that is triggered by code outside of
unsafe blocks, depending on how a result is used:

[https://github.com/rust-lang/rust/issues/33813](https://github.com/rust-
lang/rust/issues/33813)

This basically exposes LLVM's poison semantics (a kind of deferred UB) to Rust
code:

[https://llvm.org/docs/LangRef.html#poisonvalues](https://llvm.org/docs/LangRef.html#poisonvalues)

Code in another module that doesn't even know it is calling an unsafe function
could trigger UB by using this.

~~~
comex
I think you misinterpreted the issue report you linked, which goes along with
what Manishearth was saying about there being two meanings of “unsafe”. In
this case, ptr::offset is declared as an unsafe fn - meaning only unsafe code
can call it - because of the UB issue you mentioned. The reporter was asking
to remove “unsafe”, arguing that Rust generally allows safe code to create
arbitrary raw pointer values (but not deference them). Which is true, but only
because that’s designed not to cause UB. ptr::offset _can_ cause UB, so it has
to remain an unsafe fn - lest it become “unsafe” in the sense of “potentially
dangerous when called from safe code” - and the report was closed.

Edit: There are some other known ways of causing UB from safe Rust code, but
they’re considered bugs - and some longstanding ones are on the way to being
fixed in the next few months, thanks to MIR borrow checking and saturating
float->int casts.

~~~
mahkoh
Segfault:

    
    
        #[link_section = ".data"] fn main() { }

