
Unsafe Zig Is Safer Than Unsafe Rust - jfo
http://andrewkelley.me/post/unsafe-zig-safer-than-unsafe-rust.html
======
kzrdude
Rust is a language that offers you lots of compile time checks, _and_ an
escape hatch called unsafe that says “trust the programmer here.” Yes, it is
possible—and easy—to make mistakes in the place where you have asked to be
trusted, not checked.

We have a big pedagogical task ahead of us in teaching safe practices for
unsafe Rust, and defensive coding practices in unsafe Rust.

We should also think of if we can improve unsafe Rust to be harder to misuse.
There are improvements coming in compile time evaluation, and those can
potentially make the compiler much stronger when it comes to detecting memory
errors in unsafe code at compile time.

~~~
Animats
Rust's approach to "unsafe" is to let the programmer do whatever they want.
Having to use this for UNIX-type API calls is kind of lame.

I once proposed extending C to allow talking about array sizes.[1] You'd
define "read" as

    
    
        int read(int fd, char &buf[len], size_t len);
    

The compiler now knows that "buf" is an array with length "len", and can check
calls for "buf" being the right size. The generated code for the call is the
same; this doesn't require array descriptors. It just says which parameter
defines the length of the array.

All the original UNIX calls and most of the Linux ones fit into that simple
model. If the size of something is hard to define simply at an API call, the
API has a problem.

Rust's system for external C calls should be more like that and less about
casts to raw pointers. It's technically possible to fix this in C, and have a
"strict mode", but the political problems are too hard.

[1]
[http://www.animats.com/papers/languages/safearraysforc43.pdf](http://www.animats.com/papers/languages/safearraysforc43.pdf)

~~~
dbaupp
> Rust's system for external C calls should be more like that and less about
> casts to raw pointers.

It seems a rosy-eyed view to think that this would helping safety
significantly, and would require a lot of effort: it's likely to be much lower
pay-off than other things, like investing in, say, sanitizers or even just
doing the work of writing safe wrappers for popular C libs, removing C FFI
concerns from most people, who can just use the Rust library.

Specifically, as you say, C doesn't have this information, meaning there's no
way for Rust's (or another language's) FFI to work like this automatically.
Instead, someone will have to annotate the C code, have some extra "notes"
layer, or annotate the imported Rust declarations. Either way, there's a human
element, meaning a place for mistakes to be made. It seems like the less-
duplicative way to do this is to make Rust wrappers that take Rust slices,
since these will be wanted in the end anyway.

~~~
Animats
Of course you want to use Rust slices. Those map directly to the kind of C
array I outlined. If you could declare a C API that way to Rust, you'd get the
mapping without talking about pointers explicitly at all.

What I'm arguing for is a declarative way to talk about C interfaces that is
consistent with Rust's model. This is better than using "unsafe" to construct
C-type raw pointers. Yes, this is more restrictive and there will be some
awful C APIs you can't describe. That's a good indication said C API is
trouble.

~~~
Rusky
What would make this "declarative way to talk about C interfaces" less error
prone than something like this?

    
    
        extern fn read(fd: c_int, buf: *mut c_char, len: usize) -> isize;
    
        pub fn read(fd: c_int, buf: &mut [c_char]) -> isize {
            unsafe { read(fd, buf.as_mut(), buf.len()) }
        }
    

Further, note that this is insufficient for an idiomatic Rust API. You would
also want to wrap the file descriptor (perhaps not for all C APIs) and the
return value (definitely applies to all C APIs). So it would really look more
like this:

    
    
        pub struct File { fd: c_int }
    
        impl File {
            pub fn read(&self, buf: &mut [u8]) -> Result<usize, ReadError> {
                let r = unsafe { read(self.fd, buf.as_mut(), buf.len()) };
                if r == -1 {
                    Err(ReadError::from(errno))
                } else {
                    Ok(r as usize)
                }
            }
        }
    

I can certainly imagine a way to do that declaratively, but not in a way that
helps even this most basic of examples. (Also, note that _constructing_ raw
pointers is completely safe- `as_mut` for example.)

~~~
Animats
That's not bad. It would be useful to be able to use some kind of "C slice" in
an extern fn declaration, so you could talk about arrays, rather than
pointers. Same function call code, but more Rust-line syntax. Then you don't
need unsafe imperative code at all.

This would put all the memory-risky stuff in declarations of external
functions.

------
deathanatos
I think the Rust is not how you _should_ write such a code. Why not start with
the struct, and cast to a void* or a char* when C code requires it? I.e., the
buggy example becomes:

    
    
      #[derive(Copy, Clone, Debug)]
      #[repr(C)]
      struct Foo {
          a: i32,
          b: i32,
      }
    
      fn main() {
          let mut array = [Foo { a: 0x01010101i32, b: 0x01010101i32 }; 256];
          let foo = &mut array[0];
          foo.a += 1;
      }
    

The unsafe section isn't even required, and the effect is the same. And I
don't think this violates the spirit of his example, either. Consider the
author's first link to a real-world occurrence of this:

    
    
                let size = mem::size_of::<FILE_NAME_INFO>();
                let mut name_info_bytes = vec![0u8; size + MAX_PATH];
                let res = GetFileInformationByHandleEx(handle,
                                                    FileNameInfo,
                                                    &mut *name_info_bytes as *mut _ as *mut c_void,
                                                    name_info_bytes.len() as u32);
    

This is again, IMO, the wrong way to do this. You should just cast a pointer
to an instance of the FILE_NAME_INFO struct into a c_void; the structure will
need to use #[repr(C)] and the code will still be unsafe due to the C FFI, but
it will be correct (and a lot simpler). This is the same thing that you would
do in C, were you to call this function:

    
    
      FILE_NAME_INFO file_name_info;
      GetFileInformationByHandleEx(
          handle,
          FileNameInfo,
          &file_name_info,
          sizeof(file_name_info),
      )
    

just in Rust.

~~~
dbaupp
While the approach you suggest usually works well, it doesn't in this case:
FILE_NAME_INFO[1] uses a "flexible array member"[2] (although not the C99
version of it), of requiring a dynamically sized character array in the
struct's allocation, and writing directly to the memory after a struct
instance. The 'WCHAR FileName[1];' field at the end of the struct is just a
placeholder to allow easy access to that character array, the length 1 is a
lie.

[1]: [https://msdn.microsoft.com/en-
us/library/windows/desktop/aa3...](https://msdn.microsoft.com/en-
us/library/windows/desktop/aa364388\(v=vs.85\).aspx)

[2]:
[https://en.wikipedia.org/wiki/Flexible_array_member](https://en.wikipedia.org/wiki/Flexible_array_member)

~~~
deathanatos
Ugh. You're absolutely right. I never liked those even in C.

So, it seems like this is relative easy to do on the _stack_ , which is how
the example does it presently. See the link below to my attempt; the stack
allocation is still all safe code, still a single line. However, I presume
that one will want to also create one on the heap, especially since in the
example the author poses it would be a rather large stack allocation, and one
might — quite reasonably — put that on the heap.

My attempt is here: [https://play.rust-
lang.org/?gist=1c50b35941506316372da860cae...](https://play.rust-
lang.org/?gist=1c50b35941506316372da860caeceba3&version=nightly)

Couldn't avoid the unsafe for that, but, I was able to get rid of the
transmute call, and transmute is a function where the warning on the tin is
"this function is not just unsafe, it is radioactive". But the amount of code
required still felt a bit lacking.

It seems these are an area of active work[1][2] currently.

I think there is still definitely a valid point that the author is hitting —
that encoding _more_ information into the program can allow the compiler to
catch more classes of errors. (This is, after all, the very logic that gave us
Rust.)

[1]: [https://github.com/rust-lang/rfcs/pull/1909](https://github.com/rust-
lang/rfcs/pull/1909)

[2]: [https://github.com/rust-lang/rust/issues/18806](https://github.com/rust-
lang/rust/issues/18806)

------
steveklabnik
Transmute is like, the most unsafe thing possible. It basically checks if the
two things have the same size, and that's it. You're responsible for
everything else.

See all the warnings and suggested other ways to accomplish things with
[https://doc.rust-lang.org/stable/std/mem/fn.transmute.html](https://doc.rust-
lang.org/stable/std/mem/fn.transmute.html)

This is UB becuase `Foo` is not `#[repr(C)]`, in my understanding. I haven't
checked if it works if you add the repr though. I don't think I'd expect it
to.

~~~
AndyKelley
I changed it to:

    
    
            let foo = &mut array[0] as *mut u8 as *mut Foo;
            (*foo).a += 1;
    

and the IR has the same undefined behavior:
[https://godbolt.org/g/5Bv3FL](https://godbolt.org/g/5Bv3FL)

~~~
steveklabnik
Yeah I mean, to be clear, it's cool zig checks this stuff. Unsafe code is
extremely dangerous, in a variety of ways.

Luckily, outside of FFI, it's very rare to actually need to write it, though
that does of course depend on what exactly you're doing.

We hope, in the future, to basically have tooling here that can detect when
you do something UB, and warn you. As we're still sorting out the memory
model, etc, it's not here yet, but it's certainly on the agenda.

------
vivaan
Clearly both Rust and Zig tackle tough problems and implement solutions that
will have trade offs. I don't think the top answer to a post talking about
Zig's advantages should defensively try to point out how things could be
different in Rust - if only you knew exactly what to do - instead it would be
nice to see more discussion about other areas where Rust is perhaps better
suited than Zig. For instance, you Rust clearly handles memory/pointers better
(?), while maybe Zig is easier to learn?

------
devit
I think both "as" and using "transmute" for non-exceptional circumstances are
mistakes in Rust.

There should instead be a bunch of type-specific cast operators that can check
things like alignment and that what you intended to be a zero-extending
integer cast is not in fact truncating to a smaller integer type, and so on.

It's not too late to deprecate "as" and discourage using "transmute" in favor
of those.

~~~
edflsafoiewq
This isn't about transmute or having a specific operator that checks
alignment. The point is that the alignment is part of the type is zig and, to
a lesser degree, it's about having the comptime machinery for zig to decide,
when you offset a &align(4) u8 by an expression, whether the result should
have type &align(1) u8, &align(2) u8, or &align(4) u8.

------
vbernat
The x86 ABI enforces alignment of the stack to 16 bytes. Isn't that enough to
make this particular problem go away?

~~~
cesarb
No. Nothing guarantees that the array is aligned within the stack frame, even
if the stack frame is aligned. What if the compiler introduced a boolean flag
(for instance, a drop flag) immediately before the array, in the same stack
frame?

~~~
mdip
Good point, here. As is often said, when the documentation says "undefined
behavior", it means the compiler can do whatever it wants, including "work
just fine"; and sometimes it'll cause time travel[0]. Hence the "nasal demons"
lore. Often, it'll cause optimizations to be applied that would have otherwise
been avoided resulting in a bug that appears to occur somewhere else and a
programmer to look at the result of execution and ... if it actually continues
executing ... swear a lot. These are especially fun because the problem
frequently won't appear in debug builds.

[0]
[https://blogs.msdn.microsoft.com/oldnewthing/20140627-00/?p=...](https://blogs.msdn.microsoft.com/oldnewthing/20140627-00/?p=633/)
\- worth a read for some entertainment - basically what happens when the
compiler assumes "undefined behavior" can't happen and optimizes accordingly.

------
lossolo
Zig looks very interesting. There is only TODO in memory section in
documentation. From what I understand there is only manual memory management?
I've seen there is a mention about custom allocators, any details? Any RAII
like concept? or full manual memory management?

~~~
jfo
"Zig does not support RAII or operator overloading because both make it very
difficult to tell where function calls happen just by looking at a function
body."

more in the 0.1.1 release notes! [http://ziglang.org/download/0.1.1/release-
notes.html](http://ziglang.org/download/0.1.1/release-notes.html)

"Zig's standard library is still very young, but the goal is for every feature
that uses an allocator to accept an allocator at runtime, or possibly at
either compile time or runtime."

more in this wiki! [https://github.com/zig-lang/zig/wiki/Why-Zig-When-There-
is-A...](https://github.com/zig-lang/zig/wiki/Why-Zig-When-There-is-Already-
CPP%2C-D%2C-and-Rust%3F)

~~~
imtringued
>"Zig does not support RAII or operator overloading because both make it very
difficult to tell where function calls happen just by looking at a function
body."

How about showing an error if you don't call the deconstructor manually?

------
viperscape
I'd like to see a C equivalent of this, just for comparisons sake to zig

~~~
tiehuis

      #include <stdint.h>
      #include <string.h>
    	
      typedef struct {
          int32_t a;
          int32_t b;
      } Foo;
    	
      int main(void)
      {
          uint8_t array[1024];
          memset(array, 1, sizeof(array));
    	
          Foo *foo = (Foo*)(&array[0]);
          foo->a += 1;
      }
    
    

Using clang 3.8.0-2. Compiling examples with `clang -S llvm-ir`.

It appears that the array is aligned with the minimum ABI requirement 16 by
default? May be a note of this in the standard, can't recall of the top of my
head.

    
    
      %array = alloca [1024 x i8], align 16
      ...
      %6 = load i32, i32* %5, align 4
      ...
      store i32 %7, i32* %5, align 4
    
    
    

We can also explicitly specify the alignment required in C11.

    
    
      #include <stdalign.h>
      #include <stdint.h>
      #include <string.h>
    
      typedef struct {
          int32_t a;
          int32_t b;
      } Foo;
    
      int main(void)
      {
          uint8_t alignas(alignof(Foo)) array[1024];
          memset(array, 1, sizeof(array));
    
          Foo *foo = (Foo*)(&array[0]);
          foo->a += 1;
      }
    

Results in the following IR.

    
    
      %array = alloca [1024 x i8], align 4
      ...
      %6 = load i32, i32* %5, align 4
      ...
      store i32 %7, i32* %5, align 4

~~~
bjourne
On 64bit Linux, stack frames are always aligned at 16 byte boundaries. The
first 8 bytes of the frame contains the return address then there are 8 bytes
of padding and then comes the stack allocations.

I think the example is poorly constructed, because it is inconceivable that
the address to the start of an array would not be aligned sizeof(int*) bytes.

~~~
dbaupp
The example is illustrative enough: all the array needs to be misaligned in
practice is a small value on the stack near it, e.g. if the Rust code has `let
x: u8 = 1;` inserted after the array (or, I imagine, `uint8_t x = 1;` in the
C, etc.), then the array's address is odd.

~~~
bjourne
Why would it be? I tried it with msvc and it always manages to put arrays in
stack frames at 8 byte boundaries. I can't see why a compiler would not do
that.

------
eggy
I’m finding Zig easier to learn and hold in my head so that also helps me
right correct code and safe code. Zig is pretty much one man’s work and is
very impressive. I’m still playing with Rust but I am using Zig as my C
replacement right now.

------
slaymaker1907
One big glaring flaw IMO is that it is not really possible to just turn off
certain checks as opposed to turning them all off. For instance, maybe I need
to call an unsafe C api or something but could still use the borrow checker.

~~~
dbaupp
An `unsafe` block only enables extra features, it doesn't change existing
behaviour of safe Rust. Specifically, it allows calling `unsafe` functions
(FFI and pure Rust `unsafe` ones), dereferencing raw pointers and some minor
other stuff (e.g., inline assembly, some manipulations of packed structs). The
borrow checker still works on references, the trait system still enforces
Send/Sync for concurrency, and the type system still requires things to have
matching types.

It's definitely true that having a one dimensional `unsafe` might seem
unnecessarily powerful in some cases (e.g. an particular unsafe block might
just need to do some pointer offsetting and dereferencing, but no FFI), but it
isn't a "you're on your own" hammer.

------
didibus
Had never heard of zig. Does it also provide memory safety without a GC like
Rust?

~~~
audunw
No, not to the same extent.

It attempts to make C-style memory management as safe as possible, and also
make it easy to use different memory allocators, but does not attempt advanced
techniques like borrow checkers.

There's also a pretty good metaprogramming system, so it may be possible to
implement some smart memory-management libraries.

Zig is about simplicity. It's a C (and partly C++) replacement, not a Rust
replacement.

Think of it this way: I could easily imagine a TCC-like, dirt-simple, super-
fast compiler for Zig. I'm not sure we'll ever see the same for Rust.

That's nothing against Rust, just saying they have very different goals.

------
anfilt
I find it so funny people are so fixed on bounds checking. A minimal run time
environment is good. It's easier to port and runs faster. Further, there are
more issues than bounds checking.

Also a big part of it is companies don't really pay for quality software. They
just care about software that works mostly made to cost. I don't see rust
reducing this cost much except. First, one still has to interact with
hardware, that does not fit rust's/zig's/(insert safe language) run time
model. Secondly, soon as you start interacting with software out side of that
model same issues apply.

~~~
audunw
> I find it so funny people are so fixed on bounds checking. A minimal run
> time environment is good. It's easier to port and runs faster. Further,
> there are more issues than bounds checking.

Bounds checking on arrays is a compile-time check in Zig. Other forms of
bounds-checking can be disabled in release-mode.

I don't see a single compelling reason why you wouldn't at least want bounds
checking in debug mode. If you're out of bounds, something is wrong, and it's
always better to get an early and precise error about it.

In Zig you can take slices of arrays or pointers, which contain a pointer and
a length. This is not just about safety, it's also a convenience. There's a
lot of usecases where you want to pass around both a pointer and a length.

Considering how many extremely serious bugs have resulted from a lack of
bounds-checking, and considering the relatively low run-time overhead of doing
it (especially with some decent optimizations from the compiler), I don't find
it funny at all.

------
cryptos
If something is specified as "unsafe", it is implemented correctly if it is
unsafe - ask Intel ;-)

------
stochastic_monk
Take off every zig... for great justice!

~~~
jfo
all your codebase are belong to us!

------
irundebian
> we are professionals, and so we do not accept undefined behavior

Lol'd, tell that wannabe-elite-C-programmers.

